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CHAPTER  1 

SETTING  THE  CONTEXT 


Private  purchasers  of  health  insurance  have  been  one  of  the  key  forces  in  stimulating  the 
development  of  systematic  methods  to  evaluate  the  quality  of  care  provided  in  managed  care. 
There  are  at  least  two  reasons  for  purchasers’  interest  in  quality.  First,  as  employers  began 
selecting  health  insurance  options,  such  as  managed  care  organizations,  that  promised  to  reduce 
the  rate  of  increase  in  premiums,  employees’  concerns  expressed  that  the  quality  of  care  would 
decline.  The  fear  among  consumers  was  that  limits  on  choice  and  financial  incentives  to  limit 
utilization,  the  hallmarks  of  early  managed  care  organizations,  would  translate  into  poorer 
quality.  Second,  as  the  cost  of  providing  health  insurance  began  to  consume  larger  portions  of 
the  budgets  for  producing  goods  and  services,  employers  began  to  view  health  insurance  as  an 
“input”  much  like  windshields,  tires,  or  electrical  systems.  Purchasers  have  become  accustomed 
to  assessing,  monitoring,  and  continuing  to  improve  the  quality  of  other  inputs  to  production 
processes,  so  viewing  health  through  a  new  lens  was  an  obvious  next  step.  At  the  same  time, 
many  of  the  techniques  for  creating  improvement  in  industrial  processes  began  being  adapted  for 
application  in  the  health  sector,  suggesting  that  improved  monitoring  could  stimulate  better 
service  delivery. 

Until  recently,  little  information  has  been  systematically  available  on  the  quality  of  care 
delivered  in  the  U.S.  Most  of  what  is  known  comes  from  research  studies  that  focus  on  a  narrow 

range  of  conditions  in  a  small  number  of  settings  (Schuster,  McGlynn,  Brook,  199 _ ).  The 

National  Committee  for  Quality  Assurance  (NCQA)  responded  to  demands  for  better 
information  by  facilitating  a  collaboration  among  purchasers,  managed  care  plans,  and 
consumers  to  develop  a  method,  the  Health  Plan  Employer  Data  and  Information  Set  (HEDIS)  to 
systematically  collect,  analyze  and  report  data  on  health  plan  performance.  NCQA  also  accredits 
managed  care  organizations.  These  efforts  have  substantially  increased  the  amount  of 
information  available  on  health  plan  performance,  and  have  led  to  the  next  problem— making 
sense  out  of  the  information. 

In  the  Spring  of  1998,  the  “Big  Three”  automobile  manufacturers — DaimlerChrysler, 
Ford,  and  General  Motors — and  the  United  Auto  Workers  (UAW)  agreed  to  collaborate  on 
developing  a  method  for  summarizing  the  performance  of  managed  care  plans.  All  four  had 
previously  undertaken  such  efforts,  but  because  the  methods  used  by  each  one  were  different,  the 
results  also  varied.  For  example,  in  the  1997  open  enrollment  materials  for  Ford  and  GM,  three 
of  the  performance  categories  were  similar:  prevention  or  effectiveness  of  care,  access  to  care, 
and  consumer  satisfaction.  But  an  examination  of  the  “scores”  for  40  plans  with  which  both 
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companies  contracted  revealed  agreement  on  ratings  for  less  than  half  of  the  plans.  Because 
many  households  in  southeast  Michigan  include  family  members  who  work  for  more  than  one 
automobile  manufacturer,  the  potential  for  confusion  resulting  from  these  apparent  mixed 
messages  is  significant.  Such  confusion  is  likely  to  undermine  the  shared  vision  of  all  four 
organizations  that  information  on  quality  will  help  consumers  make  better  choices.  The  UAW 
played  a  key  role  in  facilitating  this  collaboration  because  its  members  work  for  all  three 
automobile  manufacturers,  making  the  importance  of  consistent  messages  to  its  membership 
essential.  The  automobile  manufacturers  and  the  UAW  have  a  history  of  collaboration  on 
providing  information  to  consumers  to  facilitate  decision  making  and  this  project  was  a  natural 
outgrowth  of  these  prior  efforts. 

A  team  at  RAND  was  selected  to  develop  the  method  for  reporting  on  health  plan 
performance  under  the  direction  of  the  Coordinated  Autos/UAW  Reporting  System  (CARS) 
Steering  Committee  consisting  of  representatives  from  each  of  the  three  automobile 
manufacturers,  the  United  Auto  Workers  (UAW),  the  State  of  Michigan  (both  as  an  employer 
and  as  a  purchaser  of  services  for  Medicaid  enrollees),  and  the  Greater  Detroit  Area  Health 
Council  (GDAHC).  In  addition,  RAND  collaborated  with  the  National  Committee  for  Quality 
Assurance  (NCQA)  and  the  Foundation  for  Accountability  (FACCT)  through  a  technical 
advisory  committee.  RAND,  NCQA,  and  FACCT  have  worked  together  on  other  quality 
measurement  and  reporting  projects.  Over  the  four  and  one-half  months  of  the  project,  the 
CARS  Steering  Committee  met  four  times  to  discuss  policy  questions  and  provide  direction  and 
feedback  on  the  project. 

The  purpose  of  this  report  is  to  describe  the  method  that  was  developed  and  the  rationale 
for  the  choices  that  were  made.  The  subsequent  chapters  discuss  creating  a  framework  for 
reporting  on  performance  (Chapter  2),  the  sources  of  data  on  performance  that  were  considered 
and  eventually  selected  (Chapter  3),  the  method  by  which  individual  measures  were  assigned  to 
categories  (Chapter  4),  methods  for  aggregating  individual  measures  into  summary  scores 
(Chapter  5),  choosing  the  reporting  strategy  (Chapter  6),  obtaining  data  from  health  plans 
(Chapter  7),  the  final  results  that  were  reported  to  the  CARS  Steering  Committee  on  August  14, 
1998  (Chapter  8),  and  a  discussion  of  the  project  with  recommendations  for  next  steps  (Chapter 
9). 


CHAPTER  2 

DEVELOPING  A  FRAMEWORK  FOR  REPORTING  ON  PERFORMANCE 


WHY  IS  A  FRAMEWORK  USEFUL? 

One  of  the  motivations  for  the  current  project,  as  well  as  numerous  efforts  in  the  popular 
press  (e.g.,  U.S.  News  and  World  Report,  Newsweek,  Consumer  Reports),  was  to  summarize  a 
variety  of  individual  measures  about  health  plan  performance  into  a  few  dimensions.  Why  is  this 
necessary?  Even  the  brightest  human  being  can  only  hold  a  few  pieces  of  information  in  short 
term  memory  when  making  a  decision.  Cognitive  psychologists  suggest  that  about  five  to  seven 
bits  of  data  can  be  utilized  when  making  a  decision.  Further,  hierarchical  structures  that  organize 
specific  details  within  a  general  framework  facilitate  the  use  of  information  in  three  ways.  First, 
hierarchies  facilitate  comprehension.  Second,  hierarchies  help  people  memorize  information  and 
retrieve  that  information  for  later  use.  Third,  hierarchies  communicate  importance.  The 
framework  used  for  the  CARS  results,  thus,  should  have  a  few  categories  and  should  organize 
information  in  a  way  that  is  useful  for  decision  makers. 

WHAT  APPROACHES  CAN  BE  USED  TO  CREATE  A  FRAMEWORK? 

There  are  two  different  strategies  for  creating  frameworks.  The  first  approach,  which 
might  be  called  “bottom-up,”  starts  with  the  individual  measures  that  are  available  and  creates 
summary  categories  that  maximize  the  number  of  measures  used.  This  can  either  be  done 
quantitatively,  using  factor  analysis  or  other  methods  designed  to  identify  patterns  in  data,  or  it 
can  be  done  qualitatively  by  obtaining  expert  opinion.  The  second  approach,  which  might  be 
called  “top-down,”  starts  with  the  information  that  potential  users  would  like  to  have  to  make 
decisions  and  identifies  measures  that  communicate  the  desired  information.  The  methods  for 
identifying  what  information  the  target  audience  wants  may  include  surveys,  focus  groups,  or 
semi-structured  interviews. 

The  bottom-up  approach  is  more  frequently  associated  with  research  or  decision  analysis. 
This  approach  has  the  advantage  of  trying  to  use  all  available  information.  Since  the  approach  is 
empirically  driven,  another  advantage  is  the  opportunity  to  identify  patterns  in  data  that  might 
otherwise  have  escaped  notice.  The  disadvantage  of  this  approach,  particularly  if  done 
quantitatively  (e.g.,  using  factor  analysis),  is  that  it  may  produce  results  that  are  difficult  to 
interpret  and  may  not  be  valued  by  the  intended  audience. 

The  top-down  approach  is  more  audience  sensitive  because  it  identifies  attributes  that  are 
important  to  those  making  the  decision.  Because  decision-makers  generally  come  to  a  task  with 
some  questions  already  in  mind,  an  optimal  top-down  approach  organizes  information  into 
categories  that  respond  to  the  questions  on  the  minds  of  potential  users.  The  disadvantage  of  this 
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approach  is  that  there  may  be  categories  of  interest  to  decision  makers  for  which  no  or  few 
measures  currently  exist. 

HOW  DO  THESE  TWO  APPROACHES  COMPARE? 

Four  existing  frameworks  illustrate  the  choices  that  were  available  when  this  project  was 
started:  Ford  1997,  U.S.  News  and  World  Report,  FACCT,  and  NCQA.  The  methods  by  which 
each  framework  was  developed  and  the  resulting  categories  are  discussed  in  this  section. 

Ford  1997 

In  1997,  the  Ford  Motor  Company  contracted  with  RAND  to  develop  a  method  for 
summarizing  the  available  performance  information  for  both  the  managers  choosing  plans  to 
offer  Ford  employees,  and  for  its  employees  to  use  in  making  decisions  during  open  enrollment. 
Ford  was  interested  in  maximizing  the  use  of  information,  so  a  bottom-up  strategy  was 
employed.  The  RAND  team  examined  the  available  measures  from  HEDIS®,  NCQA’s  Member 
Satisfaction  Survey,  NCQA  Accreditation,  GeoAccess,  and  the  Ford  Direct  Questionnaire  and 
recommended  that  summary  scores  be  developed  on  performance  in  six  categories:  effectiveness 
of  care,  consumer  assessments  of  care,  organizational  structure,  targeted  intervention  programs, 
and  resource  utilization.  These  are  described  below. 

Effectiveness  of  Care.  For  a  measure  to  be  included  in  the  effectiveness  of  care  domain, 
there  must  be  scientific  evidence  that  greater  adherence  to  the  process  being  measured  will  result 
in  improved  health  for  the  population  served  or  that  the  outcome  measured  is  likely  to  be 
substantially  influenced  by  actions  taken  by  the  plan  or  providers.  Finally,  it  is  reasonable  to 
conclude  that  variations  in  performance  on  the  measures  in  this  category  can  be  attributed  to 
differences  in  the  quality  of  service  delivery  in  the  health  plan. 

The  following  HEDIS  3.0  (1997)  effectiveness  of  care  measures  populated  this  category: 
advising  smokers  to  quit,  breast  cancer  screening,  cervical  cancer  screening,  beta  blocker 
treatment  after  heart  attack,  childhood  immunization  rate,  adolescent  immunization  rate,  prenatal 
care  in  the  first  trimester,  check-ups  after  delivery,  and  eye  exams  for  persons  with  diabetes. 

Access  to  Care.  Measures  of  access  to  care  reflect  those  factors  that  have  been  shown  to 
either  facilitate  or  inhibit  an  individual’s  ability  to  obtain  needed  services.  While  these  measures 
may  not  always  be  directly  interpretable  as  meaning  that  individuals  served  by  a  plan  lack  access 
to  care,  they  flag  potential  areas  of  concern.  Access  to  care  measures  include  both  objective 
measures  (e.g.,  capacity  of  the  physician  network,  proximity  of  service  locations  to  the 
population)  and  subjective  measures  (e.g.,  ratings  by  customers  of  experiences  in  trying  to  obtain 
services). 

Measures  in  this  category  came  from  the  Member  Satisfaction  Survey,  Ford  Direct 
Questionnaire,  HEDIS  access/availability  of  care  measures,  and  GeoAccess  data.  Four 
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subcategories  were  developed:  new  member  access  to  physicians,  organizational  characteristics, 
consumer  ratings  of  access,  and  distance  to  providers. 

Consumer  Assessments  of  Care.  Measures  in  this  category  reflect  consumers’  ratings 
of  their  experiences  with  various  aspects  of  the  health  plan  delivery  system.  Standardized  survey 
tools  facilitate  comparisons  among  health  plans  in  multiple  areas  including  interactions  with 
primary  care  and  specialty  physicians,  financial  aspects,  and  overall  ratings  of  the  health  plan. 

Measures  in  this  category  came  from  the  NCQA  Member  Satisfaction  Survey  and  the 
Direct  Questionnaire.  Four  subcategories  of  measures  were  included:  doctors  and  medical  care, 
health  plan  cost  and  coverage,  health  plan  information  and  administration,  and  health  plan 
assessments. 

Organizational  Structure.  Measures  in  this  category  reflect  the  capacity  of  the  plan  to 
manage  care  for  the  enrolled  population.  A  major  component  of  this  assessment  is  the 
accreditation  status  of  the  plan,  but  information  about  the  quality  of  physicians  in  the  plan,  use  of 
various  management  tools,  financial  and  personnel  stability,  and  responsiveness  to  requests  from 
Ford  were  also  included. 

Measures  in  this  category  came  from  NCQA  Accreditation,  URAC  Accreditation,  Ford 
Direct  Questionnaire,  and  HEDIS.  Five  subcategories  were  created:  accreditation,  physician 
panel,  plan  stability,  management  programs,  and  responsiveness  to  Ford. 

Targeted  Intervention  Programs.  Measures  in  this  category  evaluate  the  extent  to 
which  the  health  plan  has  special  programs  that  address  the  health  needs  of  individuals  with 
chronic  diseases,  multiple  conditions,  as  well  as  health  promotion  programs.  A  particular 
emphasis  for  this  category  was  on  programs  for  persons  with  cardiovascular  disease,  asthma, 
diabetes,  and  mental  health  diagnoses. 

Measures  in  this  category  came  from  HEDIS  and  the  Ford  Direct  Questionnaire.  Six 
subcategories  were  created:  cardiovascular/cerebrovascular,  other  chronic  conditions,  prenatal 
care  programs,  prevention  programs,  health  status  assessment,  and  mental  health/chemical 
dependency. 

Resource  Utilization.  Measures  in  this  category  provide  insight  into  the  relative 
intensity  of  resource  utilization  among  different  health  plans.  Information  about  the  use  of 
particular  procedures,  types  of  admissions,  and  use  of  care  in  various  settings  are  summarized  by 
the  proportion  of  premium  dollars  that  are  accounted  for  by  an  estimated  expenditure  on  selected 
categories  of  utilization. 

Measures  in  this  category  came  from  HEDIS  and  the  Ford  Direct  Questionnaire.  Seven 
subcategories  were  created:  frequency  of  selected  procedures,  high  occurrence/high  cost  DRGs, 
acute  inpatient  services,  nonacute  inpatient  services,  ambulatory  care,  mental  health/chemical 
dependency,  and  outpatient  drug. 
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U.S.  News  and  World  Report 

The  National  Opinion  Research  Center  (NORC),  under  contract  to  U.S.  News  and  World 
Report  (USNWR),  used  a  quantitatively-based,  bottom-up  methodology  to  create  summary 
categories  of  HMO  performance.  The  USNWR/NORC  method  relied  exclusively  on  HEDIS  and 
Member  Satisfaction  Survey  data.  Factor  analytic  techniques  were  used  to  develop  the 
categories.1  Five  clusters  of  28  measures  explained  two-thirds  of  the  variation  in  plan  scores. 

Prevention.  This  category  included  1 1  measures:  childhood  immunizations,  well-child 
visits  in  the  first  15  months  of  life,  well  child  visits  for  ages  3-6,  well  adolescent  visits  for  ages 
12-21,  prenatal  care  in  the  first  trimester,  check-ups  after  delivery,  breast  cancer  screening, 
cervical  cancer  screening,  beta  blocker  treatment  after  a  heart  attack,  retinal  exams  for  persons 
with  diabetes,  and  follow-up  after  hospitalization  for  a  mental  illness. 

Access  to  Care  for  Adults.  This  category  included  the  percent  of  adults  in  three  age 
groups  (20-44, 45-64, 65  and  older)  who  had  a  preventive  or  ambulatory  care  visit  in  the 
previous  two  years  and  the  percentage  of  primary  care  providers  in  the  plan  accepting  new 
patients. 

Member  Satisfaction.  This  category  included  the  percentages  of  enrollees  responding 
positively  to  questions  on  NCQA’s  Member  Satisfaction  Survey  about  overall  satisfaction, 
receiving  needed  care,  ability  to  obtain  referrals  to  specialists,  choice  of  physicians,  and  the 
ability  to  make  appointments. 

Physicians’  Credentials.  This  category  included  the  percentages  of  primary  care 
providers,  physician  specialists,  pediatricians,  OB/GYNs,  and  geriatricians  who  are  board 
certified. 

Access  to  Care  for  Children.  This  included  the  proportion  of  children  in  three  age 
groups  (12-24  months,  25  months  to  6  years,  7-11  years)  that  had  a  visit  with  a  primary  care 
provider  in  the  previous  year. 

None  of  the  use  of  services  measures  related  to  procedures,  specific  DRGs,  mental 
health/chemical  dependency,  inpatient,  or  outpatient  drug  were  used  in  the  USNWR/NORC 
model.  This  method  was  able  to  produce  scores  for  271  plans  in  45  states  using  a  four  star 
reporting  strategy;  18  plans  were  excluded  due  to  insufficient  data. 

Foundation  for  Accountability  (FACCT) 

In  contrast  to  the  expert  analytic  approaches  described  above,  the  Foundation  for 
Accountability  (FACCT)  has  developed  a  top-down  Consumer  Information  Framework.  The 
model  was  developed  using  a  focus  group  methodology  that  elicited  from  consumers  the  type  of 


1  More  detail  about  the  scoring  methodology  can  be  found  on  NORC’s  Web  page: 
http://www.norc.uchicago.edu/new/hmo.htm 
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information  they  need  to  have  available  in  order  to  choose  among  health  plans  or  providers.  This 
approach  both  creates  categories  that  reflect  how  the  target  audience  thinks  about  the  key  issues 
and  labels  those  categories  with  terms  that  are  meaningful  to  the  audience.  The  model  has  five 
components  that  are  described  below. 

The  Basics.  Measures  in  this  category  describe  how  well  health  plans  deliver  the  basics 
of  good  care-access,  skill,  communication,  coordination  of  care  and  follow-up. 

Staying  Healthy.  Measures  in  this  category  describe  how  well  health  plans  help  people 
avoid  illness  and  maintain  health  through  education,  prevention  and  risk  reduction. 

Getting  Better.  Measures  in  this  category  describe  how  well  health  plans  help  people 
recover  when  they’re  sick  or  injured  with  appropriate  treatment  and  follow-up. 

Living  With  Illness.  Measures  in  this  category  describe  how  well  health  plans  help 
people  with  chronic  conditions  reduce  symptoms,  avoid  complications  and  maintain  daily 
activities. 

Changing  Needs.  Measures  in  this  category  describe  how  well  health  plans  care  for 
people  and  their  families  at  the  end  of  life  or  when  functional  abilities  change  dramatically. 

FACCT  has  mapped  measures  from  a  variety  of  current  and  potential  measure  sets  (e.g., 
HEDIS,  CAHPS,  FACCT,  ORYX)  into  this  framework  and  has  tested  the  placement  of  those 
individual  measures  with  additional  focus  groups.2 

National  Committee  for  Quality  Assurance  (NCQA) 

NCQA  collaborated  with  FACCT  in  the  development  of  a  framework  that  could  be  used 
to  report  accreditation  results  by  category  rather  than  a  single  result.  Additional  focus  groups 
were  conducted  with  both  consumers  and  purchasers  to  develop  these  categories  which  closely 
resemble  those  originally  developed  by  FACCT.  The  system  has  five  categories  as  well,  but  one 
of  FACCT’s  original  categories  (The  Basics)  is  split  into  two  and  one  category  (Changing 
Needs)  is  not  included. 

Access  and  Service.  Measures  in  this  category  reflect  experiences  of  enrollees  in 
choosing  doctors,  making  appointments,  obtaining  desired  care,  getting  specialty  referrals,  and 
resolving  problems.  Additional  measures  may  include  objective  assessments  during 
accreditation  of  the  plan’s  ability  to  provide  access  to  needed  services. 

Qualified  Providers.  Measures  in  this  category  reflect  the  quality  of  individual  doctors 
and  medical  groups  under  contract  to  the  health  plan.  This  may  include  consumer  ratings  as  well 


2  More  information  about  FACCT’s  framework  and  measurement  work  can  be  found  on  FACCT’s  Web 
page:  http://www.facct.org. 
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as  objective  measures,  such  as  the  proportion  of  doctors  in  different  specialties  who  are  board 
certified. 

Staying  Healthy.  Measures  in  this  category  reflect  how  well  the  health  plan  helps 
people  avoid  illness  through  preventive  care,  reduction  in  health  risks  and  early  detection  of 
serious  disease. 

Getting  Better.  Measures  in  this  category  reflect  how  well  the  health  plan  helps  people 
recover  when  they  are  sick  or  injured. 

Living  with  Illness.  Measures  in  this  category  reflect  how  well  the  health  plan  helps 
people  with  chronic  conditions  maintain  or  improve  the  quality  of  their  lives  and  avoid 
complications  of  serious  illnesses. 

NCQA  is  transitioning  to  using  this  framework  for  reporting  accreditation  and  HEDIS 
performance  measures.  Each  category  would  include  information  from  the  accreditation  process 
as  well  as  a  subset  of  the  HEDIS  performance  measures.  Benchmarks  are  established  that 
incorporate  performance  within  the  region  the  health  plan  operates  as  well  as  national 
performance.3 

Comparison  of  Categories  from  Different  Methods 

Despite  the  different  approaches  to  constructing  frameworks,  there  are  some  remarkable 
consistencies  among  the  bottom-up  and  top-down  approaches  as  illustrated  Table  2.1. 


3  Accreditation  results,  HEDIS  highlights,  and  benchmarks  can  be  found  on  NCQA’s  Web  page: 
http://www.ncqa.org. 
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Table  2.1 

Comparison  of  Summary  Categories 


Ford  1997 

USNWR 

FACCT 

NCQA 

Organizational  Structure 

Physician’s 

Credentials 

The  Basics 

Qualified 

Providers 

Access  to  Care 

Adult’s  Access 
Children’s  Access 

The  Basics 

Access  and 

Service 

Consumer  Assessments 

Member 

Satisfaction 

Effectiveness  of  Care: 
Prevention 

Prevention 

Staying  Healthy 

Staying  Healthy 

Effectiveness  of  Care: 
Acute 

Getting  Better 

Getting  Better 

Effectiveness  of  Care: 
Chronic  and  Targeted 
Intervention  Programs 

Living  with  Illness 

Living  with  Illness 

Resource  Utilization 

WHAT  WERE  THE  FINAL  CARS  CATEGORIES? 

The  Steering  Committee  for  the  Coordinated  Autos/UAW  Reporting  System  (CARS) 
project  developed  a  set  of  categories  that  are  closely  related  to  those  used  by  FACCT  and  NCQA 
but  that  reflect  some  priorities  of  concern  to  the  Steering  Committee  representatives.  Five 
categories  were  included  in  the  final  system. 

NCQA  Accreditation  Status.  Based  on  a  review  by  an  independent  group  of  health 
professionals,  health  plans  are  evaluated  against  quality  standards.  Site  visits  are  included  in  the 
accreditation  determination. 

Consumer  Satisfaction.  This  category  provides  a  summary  score  of  the  overall 
satisfaction  of  HMO  members  with  their  health  plan,  including  how  much  they  were  helped  by 
the  care  received,  whether  or  not  they  would  recommend  the  plan  to  family  or  friends,  and 
whether  they  plan  to  remain  enrolled. 

Access  and  Service.  Measures  in  this  category  reflect  experiences  of  enrollees  in 
choosing  doctors,  making  appointments,  obtaining  desired  care,  getting  specialty  referrals,  and 
resolving  problems.  Additional  measures  may  include  objective  assessments  during 
accreditation  of  the  plan’s  ability  to  provide  access  to  needed  services. 

Staying  Healthy.  Measures  in  this  category  reflect  how  well  the  health  plan  helps 
people  avoid  illness  through  preventive  care,  reduction  in  health  risks  and  early  detection  of 
serious  disease. 


16 


Getting  Better/Living  with  Illness.  Measures  in  this  combined  category  reflect  how 
well  the  health  plan  helps  people  recover  when  they  are  sick  or  injured  and  how  it  helps  people 
with  chronic  conditions  maintain  or  improve  the  quality  of  their  lives  and  avoid  complications  of 
serious  illnesses. 

The  CARS  choices  reflect  both  an  interest  in  maintaining  consistency  across  reporting 
years  and  a  desire  to  produce  information  that  employees  will  find  useful.  The  Big  Three  and  the 
UAW  have  placed  considerable  emphasis  on  NCQA  Accreditation  Status  as  a  minimum 
condition  for  plans  with  which  they  contract.  The  accreditation  status  had  been  reported  to 
consumers  in  previous  years  and  there  was  an  interest  in  maintaining  the  importance  of  this 
quality  signal.  Although  there  is  relatively  little  variation  in  this  measure,  because  most  of  the 
plans  that  contract  with  the  automobile  companies  are  already  accredited  or  are  seeking 
accreditation,  the  continued  importance  of  this  objective  assessment  of  health  plan  quality  was 
reflected  in  the  decision  to  report  the  category. 

While  the  Steering  Committee  was  supportive  of  NCQA’ s  decision  to  divide  FACCT’s 
The  Basics  category  into  two,  the  category  “Qualified  Providers”  received  little  support  from  the 
group.  In  particular,  the  Steering  Committee  was  concerned  that  plans  scoring  low  on  this 
category  might  be  interpreted  to  have  providers  that  were  not  qualified  and  consumers  might 
wonder  why  such  plans  were  being  offered.  These  concerns  were  only  increased  when  the  few 
measures  that  could  be  used  to  score  this  category  were  examined.  These  measures  were  felt  to 
be  too  narrow  in  scope  to  reflect  an  evaluation  of  provider  qualifications. 

Neither  the  FACCT  nor  NCQA  frameworks  include  a  category  called  Consumer 
Satisfaction,  although  both  Ford  1997  and  USNWR  included  such  categories  (Consumer 
Assessments  of  Care  and  Member  Satisfaction,  respectively).  In  previous  years,  The  Big  Three 
had  reported  on  Consumer  Satisfaction  with  health  plans.  Given  the  push  to  enroll  employees  in 
managed  care,  the  Steering  Committee  was  concerned  that  if  Consumer  Satisfaction  was  dropped 
as  a  category,  employees  would  interpret  this  as  a  signal  that  satisfaction  was  no  longer  a  priority 
for  purchasing  decisions. 

The  decision  to  merge  Getting  Better  and  Living  with  Illness  was  made  after  determining 
that  only  four  measures  were  available  to  score  those  categories  (two  measures  in  each  category). 
The  CARS  Steering  Committee  believed  that  this  was  too  few  to  support  a  category  score.  So, 
the  two  categories  were  combined.  The  intent  is  to  split  these  categories  apart  in  the  future  as 
more  measures  become  available.  A  number  of  new  chronic  disease  measures  are  on  the  horizon 
to  be  added  to  HEDIS,  so  this  division  may  occur  within  the  next  couple  of  years. 


CHAPTER  3 
SOURCES  OF  DATA 


Health  plan  performance  information  is  available  from  a  variety  of  sources.  The 
following  data  sources  were  evaluated  for  inclusion  in  the  CARS  model:  The  Health  Plan 
Employer  Data  and  Information  Set  (HEDIS),  NCQA’s  Member  Satisfaction  Survey  (MSS),  The 
Consumer  Assessment  of  Health  Plans  Survey  (CAHPS),  accreditation  status  (NCQA,  URAC, 
JCAHO),  GDAHC/GM/DaimlerChrysler’s  Request  for  Information  (RFI),  Ford’s  Direct 
Questionnaire,  GeoAccess,  and  The  Foundation  for  Accountability  (FACCT). 

DESCRIPTION  OF  DATA  SOURCES 

The  Health  Plan  Employer  Data  and  Information  Set  (HEDIS) 

HEDIS  is  a  standardized  set  of  health  plan  performance  measures  that  are  collected  and 
maintained  by  the  National  Committee  for  Quality  Assurance  (NCQA).  HEDIS  measures 
evaluate  the  effectiveness,  availability,  costs,  and  utilization  of  health  care  services  within  a 
managed  care  plan.  Approximately  300  of  the  nation’s  managed  care  plans  submit  HEDIS  data 
to  NCQA  on  an  annual  basis  for  public  release.  HEDIS  3.0/1998  (the  measure  set  reported  in 
1998  on  performance  in  calendar  year  1997)  contains  46  measures  plus  a  member  satisfaction 
survey  for  commercial  enrollees  in  health  plans.  Many  of  the  plans  that  do  not  submit  HEDIS 
performance  data  to  NCQA  for  public  reporting  do  collect  the  data  for  internal  use  or  to  meet 
contracting  requirements. 

NCQA’s  Member  Satisfaction  Survey  (MSS) 

The  MSS  is  a  standardized  survey  that  addresses  satisfaction  with  care  and  is  a  required 
component  of  HEDIS  3.0/1998.  The  MSS  contains  79  items  that  examine  the  following  content 
areas:  socio-demographic  characteristics  of  the  respondent,  screening  questions  for  length  of 
coverage  by  the  health  plan,  satisfaction  with  care  and  the  health  plan’s  services,  and  the 
respondent’s  health  and  daily  activities. 

The  Consumer  Assessment  of  Health  Plans  Survey  (CAHPS) 

CAHPS  is  a  standardized  satisfaction  survey  that  is  completed  by  a  health  plan’s 
members.  The  Adult  Core  Questionnaire  from  CAHPS  contains  46  items;  nine  additional 
measures  appear  in  CAHPS’  Child  Core  Questionnaire.  The  CAHPS  and  HEDIS  satisfaction 
surveys  will  be  converged  for  calendar  year  1998  and  reported  in  HEDIS  1999.  A  converged 
survey  was  not  available  in  HEDIS  3.0/1998. 
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Accreditation  Status  (NCQA,  URAC,  JCAHO) 

NCQA  offers  accreditation  to  managed  care  organizations.  Over  300  health  plans 
nationally  have  been  or  are  scheduled  to  be  reviewed  by  NCQA  for  accreditation.  The  American 
Accreditation  Healthcare  Commission/URAC  offers  eight  different  accreditation  programs  for 
managed  care  plans,  although  the  organization’s  primary  focus  has  been  on  accreditation  of 
preferred  provider  organizations  (PPOs).  The  Joint  Commission  on  Accreditation  of  Healthcare 
Organizations  (JCAHO)  offers  accreditation  to  hospitals,  health  care  networks,  PPOs,  and  health 
care  organizations  that  provide  home  care,  long  term  care,  behavioral  health  care,  and  laboratory 
and  ambulatory  services. 

Request  for  Information  (RFI) 

DaimlerChrysler  and  General  Motors,  in  conjunction  with  the  Greater  Detroit  Area 
Health  Council  (GDAHC),  require  health  plans  with  which  they  contract  to  participate  in  an  RFI 
process.  The  1998  RFI  consists  of  over  400  questions  that  collect  information  regarding  the 
financial,  clinical,  and  service  performance  of  the  health  plan.  A  contractor  then  conducts  site 
visits  to  supplement  the  assessment. 

Ford’s  Direct  Questionnaire 

Health  plans  that  contract  with  Ford  are  required  to  fill  out  a  Direct  Questionnaire.  The 
Direct  Questionnaire  contains  a  subset  of  the  RH  questions  that  collect  information  on  health 
plans’  disease  management  and  prevention  programs. 

Geo  Access 

Geo  Access  is  a  proprietary  system  that  provides  an  objective  measure  of  member’s 
geographic  proximity  to  health  care  providers  and  hospitals.  GeoAccess  can  be  used  to  identify 
the  proportion  of  beneficiaries  that  are  within  an  “ideal”  or  “acceptable”  drive  to  one  or  more 
primary  care  physicians  and  participating  hospitals. 

Foundation  for  Accountability  (FACCT) 

FACCT  is  a  not-for-profit  organization  that  was  established  to  identify,  develop  and 
endorse  measures  of  health  care  performance.  In  addition  to  its  consumer  information 
framework,  FACCT  has  proposed  a  number  of  performance  measures  designed  to  capture 
consumer-relevant  information. 

CRITERIA  FOR  SELECTING  DATA  SOURCES 

After  the  potential  data  sources  for  health  plan  performance  data  were  identified,  they 
were  evaluated  on  four  criteria:  availability,  timeliness,  reliability,  and  the  ability  to  distinguish 
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between  plans.  Table  3.1  displays  a  mapping  of  the  measures  in  each  of  the  data  sources  to  the 
criteria;  an  “X”  means  that  a  majority  of  measures  in  the  data  source  meet  the  criterion. 

Availability 

Inclusion  of  a  data  source  depended  on  how  likely  it  was  to  be  available  on  most  health 
plans  with  which  the  three  automobile  manufacturers  contract.  A  coordinated  strategy  for  health 
plan  performance  reporting  requires  that  a  common  set  of  performance  measures  be  used. 
Measures  that  were  not  available  for  the  majority  of  health  plans  with  which  the  automobile 
manufacturers  contract  could  not  be  used  to  effectively  compare  performance  across  plans. 

Timeliness 

A  data  source  was  only  included  if  the  results  were  expected  to  be  available  in  time  to 
produce  results  for  GM  and  DaimlerChrysler’s  fall  open  enrollment  period  (Ford  has  a  winter 
open  enrollment  period).  In  order  to  score  performance  and  provide  the  results  to  GM  and 
DaimlerChrysler  for  the  publication  of  their  open-enrollment  materials,  data  needed  to  be 
available  to  RAND  by  the  end  of  July,  1998.  Measures  could  meet  the  timeliness  criterion 
without  meeting  the  availability  criterion  (i.e.,  if  results  could  be  obtained  within  the  timeframe 
for  only  a  subset  of  plans). 

Reliability 

The  data  sources  also  needed  to  be  a  reliable  source  of  information.  The  intent  of 
providing  a  coordinated  strategy  for  reporting  on  quality  was  to  send  a  consistent  and  accurate 
message  about  health  plan  performance  to  the  marketplace.  Data  from  unreliable  sources  would 
have  the  potential  to  misrepresent  the  true  performance  of  health  plans.  Representatives  from  the 
automobile  manufacturers  and  the  UAW  were  strongly  committed  to  communicating  an  accurate 
message  on  health  plan  performance  to  the  marketplace  and  specifically  to  their  beneficiaries. 

Distinguish  Among  Plans 

Finally,  a  data  source  needed  to  be  useful  in  distinguishing  among  health  plans.  To 
identify  differences  in  quality,  measures  are  most  useful  when  there  is  variation  in  performance 
or  when  performance  is  consistently  poor.  We  evaluated  whether  plans  were  likely  to  have 
significantly  different  scores  on  the  measures  in  each  source. 
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Table  3.1 


Data  Sources  and  Inclusion  Criteria 


Data  Source 

Available 

Criteria  for  Evaluating  Data  Sources 

Timely  Reliable  Distinguishable 

HEDIS 

X 

X* 

X 

X 

MSS 

X 

X* 

X 

X 

CAHPS 

X 

X 

NCQA  Accreditation 

X 

X 

X 

X 

URAC  Accreditation 

X 

X 

X 

JCAHO  Accreditation 

X 

X 

X 

RFI 

X** 

X 

9 

? 

Direct  Questionnaire 

X** 

X 

9 

? 

GeoAccess 

X 

X 

FACCT 

? 

? 

*  The  HEDIS  and  MSS  measures  were  available  directly  from  the  health  plans  in  time  to  be  used  for  the  open 


enrollment  materials.  However,  HEDIS  and  MSS  measures  were  not  available  via  a  direct  data  feed  from 
NCQA  to  meet  the  open  enrollment  deadline  (see  Chapter  7  on  Obtaining  Data  from  Health  Plans). 

**  Availability  of  RFI  and  Direct  Questionnaire  measures  for  all  health  plans  refers  only  to  the  common  set  of 
disease  management  and  prevention  measures  found  in  both  data  sources. 

INITIAL  SELECTION  OF  DATA  SOURCES 

Given  the  four  criteria  for  the  inclusion  of  a  data  source  (e.g.,  available,  timely,  reliable, 
and  distinguishable),  the  initial  selection  of  data  sources  included  the  following:  HEDIS,  MSS, 
NCQA  Accreditation,  and  the  common  set  of  measures  found  in  the  RFI  and  Direct 
Questionnaire. 

CAHPS  was  eliminated  as  a  data  source  for  1998  performance  scores  because  the  survey 
was  not  fielded  on  enrollees  in  most  health  plans.  Subsequent  iterations  of  the  CARS  model  will 
incorporate  the  converged  CAHPS  and  HEDIS  satisfaction  surveys. 

Accreditation  status  is  believed  to  be  a  reliable  way  to  distinguish  health  plan  quality. 
Due  to  contract  requirements,  there  is  little  variation  in  accreditation  status  among  the  health 
plans  with  which  the  Big  Three  contract.  However,  in  the  broader  universe  of  health  plans, 
accreditation  status  is  a  significant  indicator  for  health  plan  quality.  All  three  of  the  accrediting 
bodies  post  the  accreditation  status  of  reviewed  organizations  on  their  Web  pages.  The 
availability  of  URAC  and  JCAHO  Accreditation  status  was  extremely  limited  for  the  HMOs  that 
were  to  be  evaluated  under  the  CARS  model,  consequently  they  were  excluded  from  the  model. 
NCQA  Accreditation  was  maintained  as  a  data  source  because  it  met  all  of  the  defined  criteria, 
including  availability  for  the  majority  of  plans. 

The  overlapping  set  of  disease  management  and  prevention  program  questions  were 
included  in  the  initial  selection  of  data  sources.  During  the  initial  selection  process  it  was  clear 
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that  the  data  from  the  RH  and  Direct  Questionnaire  would  be  available  for  the  health  plans  with 
which  the  autos  contract  on  a  timely  basis.  However,  little  was  known  about  the  other 
characteristics  (i.e.,  reliability  and  ability  to  distinguish  between  health  plans)  of  the  data  because 
this  was  the  first  time  that  DaimlerChrysler,  Ford,  GM  and  GDAHC  had  coordinated  to 
incorporate  identical  questions  for  disease  management  and  prevention  programs.  The 
RFI/Direct  Questionnaire  measures  for  disease  management  and  prevention  were  eventually 
dropped  from  the  scoring  strategy  in  large  part  because  health  plans  felt  very  strongly  that  the 
data  were  highly  unreliable. 

Ford  had  previously  used  information  from  GeoAccess  to  assess  their  beneficiaries’ 
access  to  primary  care  physicians  and  hospitals.  GeoAccess  scores  must  be  purchased  for  each 
health  plan,  so  while  the  data  were  available,  an  extra  expenditure  would  have  been  required. 
GeoAccess  results  demonstrated  little  variation  in  access  to  providers  between  the  plans  with 
which  Ford  contracts.  It  was  decided  that  the  information  gained  through  GeoAccess  best 
informs  the  decision  of  whether  or  not  to  contract  with  a  health  plan.  Because  the  purpose  of  the 
CARS  project  was  to  provide  information  to  consumers  for  choice  among  plans  already  selected 
by  the  CARS,  GeoAccess  was  excluded  from  the  system  based  on  its  relatively  poor 
discrimination  among  plans. 


CHAPTER  4 

ASSIGNING  MEASURES  TO  CATEGORIES 


Once  a  framework  and  data  sources  had  been  selected,  the  next  task  was  to  assign 
individual  measures  (for  surveys,  these  are  often  called  items)  to  the  appropriate  framework 
category.  There  are  three  basic  approaches  to  this  task: 

•  Expert  judgment:  A  set  of  experts  who  understand  the  content  of  the  measures  to  be 
included  in  the  system  use  the  framework  category  definition  to  guide  assignment  of 
measures  to  categories.  Generally,  this  is  done  with  multiple  experts  and  formal  or 
informal  methods  may  be  used  to  arrive  at  consensus. 

•  Consumer  judgment:  One  or  more  groups  of  consumers  could  be  asked  to  assign 
measures  to  categories.  This  may  be  done  in  focus  groups,  one-on-one  interviews,  or 
through  a  large-scale  voting  method. 

•  Analytic  assignment:  Methods  such  as  factor  analysis  can  be  used  to  assign  measures 
to  categories  based  on  empirically  observed  relationships  among  the  measures.  The 
USNWR/NORC  method  described  in  Chapter  2  is  an  example  of  how  categories  were 
created  and  measures  assigned  using  factor  analytic  techniques. 

Each  of  the  methods  has  advantages  and  disadvantages.  In  the  interest  of  time  and 
budgets,  the  CARS  Steering  Committee  selected  the  first  approach,  expert  judgment,  to  assign 
measures  to  categories. 

The  method  that  was  used  could  be  called  a  “very  modified  Delphi  method.”  The  first 
step  was  for  RAND  to  make  an  initial  selection  of  measures  for  inclusion  from  each  of  the  data 
sources  that  had  been  selected.  For  example,  since  the  CARS  framework  does  not  include 
categories  that  make  use  of  cost  or  utilization  information,  most  of  the  measures  in  the  HEDIS 
use  of  services  category  were  not  included. 

Representatives  from  RAND,  FACCT  and  NCQA  were  asked  to  assign  the  selected 
measures  to  the  CARS  framework  categories  independently.  Because  of  the  short  timeframe, 
formal  votes  were  received  from  only  RAND  and  FACCT.  Representatives  from  each  of  the 
three  organizations  then  met  to  discuss  differences  and  to  resolve  as  many  as  possible.  Most 
differences  were  satisfactorily  resolved. 

RAND  presented  the  results  of  the  expert  process  to  the  CARS  Steering  Committee 
which  was  asked  to  evaluate  the  results  and  resolve  any  outstanding  differences. 

The  results  of  this  process  are  summarized  in  Table  4.1.  The  first  column  lists  the 
individual  measures  (which  for  the  survey  are  individual  questions).  The  next  six  columns 
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reflect  the  votes  made  by  RAND  and  FACCT  by  category,  including  additional 
recommendations  to  exclude  the  measure  from  the  framework.  The  seventh  column  contains  a 
flag  for  disagreements.  The  last  column  shows  the  final  category  assignment;  this  process 
resulted  in  43  measures  being  assigned  to  categories. 

The  voting  was  done  before  the  framework  had  been  completely  decided,  so  this  table 
reflects  the  NCQA  framework  rather  than  the  final  CARS  framework.  The  information  is 
presented  to  give  an  idea  of  the  areas  of  controversy  and  to  illustrate  the  method  that  was  used. 
Expert  opinion  was  also  used  to  create  the  Consumer  Satisfaction  category  that  ultimately  was 
included  in  the  CARS  framework. 

Two  examples  of  how  viewpoints  can  affect  assignments  are  presented  to  illustrate  some 
of  the  differences  we  encountered.  One  of  the  HEDIS  measures  is  Breast  Cancer  Screening, 
which  reflects  a  plan’s  mammography  screening  rate  for  women  age  50  and  older.  From  a 
clinical  perspective,  mammography  is  used  to  identify  breast  cancer  at  a  sufficiently  early  stage 
that  the  opportunities  for  a  cure  are  substantially  increased.  This  would  imply  that  the  measure 
could  be  placed  in  the  Getting  Better  category  (helping  people  who  are  sick  recover).  From  a 
consumer  perspective,  however,  mammography  is  done  to  help  people  avoid  illness  and  as  such 
belongs  in  the  Staying  Healthy  category.  For  CARS,  breast  cancer  screening  was  placed  in  the 
Staying  Healthy  category. 

Another  measure,  obtained  from  survey  data,  determines  whether  persons  in  a  plan  who 
smoke  were  advised  by  their  physician  to  stop  smoking.  From  a  consumer  perspective,  this 
measure  reflects  how  well  health  plans  help  people  avoid  illness  and  reduce  risks.  From  a 
clinical  perspective,  many  smokers  are  already  suffering  health  consequences  from  their 
behavior  which  would  place  the  measure  in  the  Getting  Better  category.  One  good  time  to  get 
people  to  stop  smoking  is  when  they  are  suffering  from  other  illnesses,  particularly  respiratory  or 
cardiac  problems.  This  measure  was  ultimately  placed  in  the  Getting  Better  category. 

Analytic  methods,  such  as  examining  correlations  between  items  and  scales,  can  also  be 
used  to  evaluate  where  a  measure  should  be  placed.  Such  methods  were  not  utilized  during  this 
round. 


10a.  How  long  did  it  take  for  the  health  plan  to  resolve  your  complaint? 

15.  Over  the  last  12  months,  how  many  days  did  you  usually  have  to 
wait  between  the  time  you  made  an  appointment  for  care  and  the  day 
you  actually  saw  the  provider?  c.  Chronic  or  ongoing  condition 
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Table  4.1 

Results  of  Expert  Voting  to  Assign  Measures  to  Categories 
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CHAPTER  5 

AGGREGATION  METHODS 


A  number  of  methodological  issues  arise  when  combining  data  to  produce 
aggregate  scores,  including:  handling  missing  data,  standardization  of  scales,  weighting, 
and  incorporating  uncertainty.  Each  of  these  issues  is  discussed  in  this  section. 

For  each  of  these  areas,  a  two-stage  approach  was  taken.  First,  it  was  determined 
whether  any  of  the  various  options  for  addressing  an  issue  was  likely  to  produce  a 
different  result.  If  the  results  were  insensitive  to  the  method  of  handling  the  result,  then 
RAND  recommended  the  method  that  was  most  defensible  within  the  context  of  the 
project  goals.  If  the  results  were  sensitive  to  different  methods,  RAND  presented  the 
advantages  and  disadvantages  of  each  option  to  the  CARS  Steering  Committee  for  a 
policy  decision.  Frequently,  choices  required  a  compromise  between  statistical  rigor, 
incentives  for  improved  reporting  in  future  years,  and  short  run  expediency. 

HANDLING  MISSING  DATA 

It  was  known  that  plans  would  be  missing  data  in  a  variety  of  different  patterns. 
Some  plans  might  not  report  any  items  in  a  particular  data  source.  Other  plans  might 
report  some  information  but  not  all  (e.g.,  some  HEDIS  measures  might  not  be  reported). 

Six  alternatives  for  handling  missing  data  are  possible:  (a)  do  not  report  a  score 
for  the  plan  unless  all  items  are  available;  (b)  report  the  average  of  only  those  items  that 
are  reported;  (c)  impute  the  average  value  for  the  measure  among  plans  that  reported;  (d) 
impute  a  score  from  regression  analysis;  (e)  impute  a  zero  value  for  the  measure;  (f) 
impute  the  lowest  value  reported  by  plans  that  provided  data.  Four  of  these  alternatives 
(a,  c,  e,  f)  were  modeled  using  1997  HEDIS  and  MSS  data  and  found  that  the  method  of 
handling  missing  data  did  affect  the  results.  Each  of  the  options  is  described  followed  by 
an  analysis  of  the  differences  across  the  options. 

Require  Complete  Reporting 

Under  this  option,  plans  missing  any  individual  measure  in  a  scale  would  receive 
“NR”  as  their  score.  There  was  concern  that  this  option  would  significantly  decrease  the 
number  of  plans  for  which  a  numeric  summary  score  was  available  and  that  consumers 
might  have  difficulty  comparing  “not  reporting”  against  an  actual  value.  This  strategy 
would  also  eliminate  plans  that  had  any  data  missing,  thus  potentially  penalizing  plans 
that  provided  most  information. 
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Summarize  Available  Data 

Another  possibility  is  to  summarize  only  the  available  data  elements.  For 
example,  if  the  score  is  the  average  of  5  items  and  only  3  items  are  non-missing,  the 
average  of  the  non-missing  items  could  be  taken.  The  challenge  with  this  approach  is  to 
make  it  “fair.”  Plans  might  be  encouraged  to  report  only  the  measures  on  which  they 
perform  well.  If  the  items  that  a  plan  is  missing  are  the  most  difficult  to  score  well  on, 
then  averaging  the  easy  items  is  unfair  to  the  plans  that  report  more  completely.  Fixing 
this  problem  quickly  leads  to  elaborate  imputation  methods. 

Impute  the  Average  from  Reporting  Plans 

This  approach  is  frequently  taken  in  research  because  it  is  simple  and  matches  the 
mean  of  the  imputed  data  to  the  mean  of  the  raw  data.  The  policy  problem  is  that  it  does 
not  encourage  plans  whose  performance  is  below  average  to  report  because  the  imputed 
score  they  will  receive  is  better  than  their  actual  score.  Such  incentives  are  inconsistent 
with  the  policy  of  the  CARS  Steering  Committee  which  is  to  require  all  plans  to  report 
complete  information. 

Regression  Imputation 

More  elaborate  imputation  methods  are  common  in  research  analysis.  The  next 
level  up  on  the  complexity  scale  is  regression  imputation.  Regression  imputation  uses 
the  complete  plans’  data  to  develop  a  model  of  the  relationship  between  the  various  items 
in  a  scale.  This  model  is  then  used  to  impute  missing  values  for  the  plans  with 
incomplete  data.  This  approach  would  have  been  further  pursued  if  the  sense  of  the 
committee  hadn’t  already  been  headed  in  the  direction  of  methods  that  strongly 
encouraged  reporting.  Regression  imputation  method  also  requires  more  lead  time 
between  receipt  of  the  data  and  reporting  than  RAND  anticipated  having. 

Imputing  a  Zero 

In  this  option,  plans  not  reporting  a  result  for  a  measure  would  receive  a  value  of 
zero  for  that  measure.  This  option  was  viewed  as  both  overly  harsh,  particularly  to  plans 
that  report  most  of  the  data,  and  potentially  not  distinguishable  from  the  “not  reporting” 
option.  It  may,  however,  create  incentives  to  report  in  future  years. 

Imputing  the  Lowest  Observed  Score 

Under  this  option,  plans  not  reporting  a  result  for  a  measure  would  receive  a  value 
equal  to  the  lowest  number  reported  by  any  plan  on  that  measure.  This  approach  has  the 
advantage  of  producing  a  nonzero  score  for  a  plan  but  placing  performance  at  a  low 
enough  level  that  most  plans  should  be  able  to  improve  their  overall  score  by  reporting 
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performance.  This  option  was  used  at  both  the  measure  and  data  source  level — that  is,  if 
a  plan  failed  to  report  one  or  more  measures  within  a  category  it  would  receive  the  lowest 
score  reported  for  each  measure.  This  was  also  true  if  a  plan  reported  no  measures  within 
a  category  (e.g.,  if  the  plan  had  not  fielded  the  Member  Satisfaction  Survey).  For  these 
items,  the  standard  error  was  also  set  to  zero,  because  we  know  the  score  with  certainty. 

Results 

The  two  major  options  under  consideration  for  handling  missing  data  were  mean  and 
minimum  value  imputation.  They  also  provide  a  sharp  contrast  in  strategies.  Mean  imputation  is 
an  attempt,  however  simple,  to  fill  in  an  estimate  of  what  the  missing  value  would  have  been  had 
it  been  available.  Minimum  value  imputation  is  not  an  attempt  to  estimate  the  missing  value. 
Instead  it  is  a  policy  choice  designed  to  encourage  more  complete  reporting  in  future  years. 
Minimum  value  imputation  sends  the  signal  that  reporting  what  you  have  is  better  than  not 
reporting,  no  matter  how  poor  your  scores  are. 

Tables  5.1  through  5.5  present  the  results  for  the  plans  using  both  mean  and  minimum 
value  imputation.  The  methods  used  to  put  plans  into  one  of  three  result  categories  are  discussed 
later  in  this  chapter.  Note  that  plans  that  land  on  the  diagonal  (i.e.,  12, 90, 21  in  Table  5.1) 
would  receive  the  same  number  of  stars  under  each  method.  Plans  off  the  diagonal  would 
receive  different  scores  under  the  two  methods.  For  example,  in  Table  5.1,  of  the  97  plans  that 
would  receive  2  stars  under  mean  imputation  6  would  receive  1  star  and  one  would  receive  3 
stars  under  minimum  value  imputation.  The  6  plans  that  moved  down  received  lower  scores 
because  of  missing  data.  The  one  plan  that  moved  up  did  so  because  the  minimum  value 
imputation  method  moved  the  cut  point  lower. 

In  general,  the  choice  of  missing  value  methods  has  a  substantial  effect  on  plan  results 
across  all  domains.  Table  5.5  presents  the  correlations  between  the  imputed  scales.  Note  that 
the  correlations  are  fairly  high,  0.9  or  better,  with  the  exception  of  the  Getting  Better/Living 
With  Illness  category.  This  is  a  consequence  of  plans  having  more  missing  data  in  this  category. 


34 


Table  5.1 


Consumer  Satisfaction:  Comparison  of  Missing  Value 
Imputation  Methods 


Table  5.2 

Access  and  Service:  Comparison  of  Missing  Value  Imputation 

Methods 


Table  5.3 

Staying  Healthy:  Comparison  of  Missing  Value  Imputation 

Methods 


35 


Table  5.4 

Getting  Better/Living  with  Illness:  Comparison  of  Missing 
Value  Imputation  Methods 
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Table  5.5 

Correlation  of  Scores  for  Minimum  Value  Imputation  and  Mean  Value 

Imputation 


Category 

Correlation  Coefficient 

Consumer  Satisfaction 

0.928 

Access  and  Service 

0.913 

Staying  Healthy 

0.906 

Getting  Better/Living  with  Illness 

0.742 

Policy  Implications  and  Recommendations 

The  missing  value  imputation  decision  has  the  largest  leverage  of  any  technical  decision 
in  the  CARS  reporting  process  on  the  ultimate  plan  rankings.  RAND  recommended,  and  the 
CARS  Steering  Committee  concurred,  that  a  long  ran  view  was  appropriate.  Minimum  value 
imputation  was  thought  to  provide  the  right  incentives  for  reporting  in  future  years. 

STANDARDIZING  SCALES 

The  individual  measures  that  were  being  combined  to  create  each  of  the  summary 
categories  were  known,  based  on  1997  data,  to  have  different  means  and  variances.  This 
potentially  presents  a  problem  for  scaling  in  that  it  can  make  some  measures  have  a  greater  (or 
lesser)  effect  on  the  results  because  of  their  distributional  properties.  RAND  modeled  the  scales 
based  on  1997  data  and  found  few  differences  between  the  results  based  on  standardized  versus 
nonstandardized  scales.  However,  because  standardization  is  more  defensible  analytically,  all 
items  within  scales  were  standardized. 

Standardization  is  a  simple  calculation,  but  is  frequently  misunderstood  due  to  its 
similarity  to  related  statistical  calculations.  The  idea  is  to  transform  item  scores  so  that 
plans  are  ranked  on  a  comparable  scale  across  items.  This  prevents  an  item  with  a  large 
range  (say  1-100)  from  completely  dominating  an  item  with  a  small  range  (say  0-1). 
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Standardization  is  accomplished  by  dividing  all  of  the  plan  values  by  the  standard 
deviation  of  the  plan  values.  For  example,  an  item  on  the  0-100  scale  might  have  a  plan- 
to-plan  standard  deviation  of  20.  The  standardized  item  would  then  run  from  0-5.  An 
item  on  the  0-1  scale  could  have  a  plan-to-plan  standard  deviation  of  0.25.  After 
standardization  it  would  range  from  0-4.  The  standardized  scores  are  much  more 
comparable  than  the  original  factor  of  100  difference.  Frequently  the  mean  is  subtracted 
prior  to  division  by  the  standard  error.  This  was  not  done  in  the  analysis  since  only 
relative  scores  were  used. 

The  plan-to-plan  standard  deviation  is  the  standard  deviation  of  the  plan  averages.  Note 
that  it  is  not  the  standard  error  of  the  mean  or  the  standard  deviation  of  the  person  level  values. 

If  the  plan  means  were  normally  distributed  most  plans  would  fall  between  -2  and  +2  after  our 
standardization.  One  potential  source  of  confusion  is  the  superficial  similarity  of  the 
standardizing  calculation  to  the  z-score  test  of  difference  from  the  mean.  In  the  z-sore  test,  the 
divisor  is  a  measure  of  sampling  error  rather  than  of  distributional  variability.  Z-scores  for  plans 
could  easily  produce  values  far  from  the  -2  to  +2  range.  Standardization  is  not  meant  to  test  the 
difference  of  the  plan  from  the  mean.  It  is  just  meant  to  put  items  on  roughly  comparable  scales. 
The  formula  is  shown  below. 


Standardized  Item  = 


_ (Plan  Value) _ 

(Plan  to  Plan  Standard  Deviation) 


The  benefit  of  standardizing  is  that  it  makes  it  easier  to  compare  items  and  to  understand 
the  meaning  of  weights  applied  to  those  items.  By  thinking  in  the  standard  deviation  scale, 
simple  rules  of  thumb  based  on  the  normal  distribution  make  comparison  easier.  Thinking  of  a  1 
standard  deviation  increase  in  each  item  is  often  easier  than  comparing  a  35  point  increase  in  a 
100  point  scale  to  a  .012  increase  in  the  mean  of  a  dichotomous  variable,  for  example. 


Results 

It  was  a  surprise  to  see  that  standardization  had  little  effect  on  plan  rankings.  Tables  5.6 
-5.9  show  that,  across  the  four  domains,  very  few  plans  change  the  number  of  stars  that  they 
would  receive  ^standardization  is  used  or  not.  In  retrospect,  this  probably  should  not  have  been 
a  surprise.  In  most  of  the  scales,  the  items  were  largely  scored  on  the  same  scale  prior  to 
standardization.  This  similarity,  combined  with  some  correlation  between  items,  mutes  the 
effect  of  standardization.  Note  the  very  high  (greater  than  0.97)  correlation  between  the 
standardized  and  unstandardized  scales  in  Table  5.10. 
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Table  5.6 


Consumer  Satisfaction:  Comparison  of  Standardization  and  No 

Standardization 


Table  5.7 


Staying  Healthy:  Comparison  of  Standardization  and  No  Standardization 


Table  5.8 


Access  and  Service:  Comparison  of  Standardization  and  No 

Standardization 
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Table  5.9 


Getting  Better/Living  with  Illness:  Comparison  of  Standardization 

and  No  Standardization 


Table  5.10 

Correlation  of  Scores  Between  Standardization  and  No  Standardization 


Category 

Correlation  Coefficient 

Consumer  Satisfaction 

0.991 

Access  and  Service 

0.988 

Staying  Healthy 

0.971 

Getting  Better/Living  with  Illness 

0.979 

Policy  Implications  and  Recommendations 

Although  there  is  very  little  leverage  to  the  standardize  vs.  don’t  standardize  decision, 
RAND  recommends  standardization.  At  the  cost  of  a  very  small  increase  in  complexity, 
standardization  makes  the  meaning  of  weighting  and  the  interpretation  of  scales  more 
straightforward. 

WEIGHTING 

There  were  two  major  choices  in  constructing  scales:  (a)  treat  all  measures  as  equally 
important  thus  allowing  each  measure  to  contribute  an  equal  amount  to  the  scale;  (b)  develop 
differential  weights  for  measures  within  each  scale.  While  there  are  a  variety  of  different 
approaches  that  can  be  taken  to  developing  weights,  given  the  time  constraints  and  experience  of 
the  RAND  team  with  the  measures,  RAND  proposed  a  set  of  expert  weights.  These  weights  are 
presented  in  Tables  5.11  through  5.14. 
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Table  5.11 

Measures  and  Weights  for  Consumer  Satisfaction  Scale 

Item  (from  Member  Satisfaction  Survey) 

Weight 

How  much  were  you  helped  by  the  care  you  received? 

1 

How  would  you  rate  the  overall  quality  of  care  and  services? 

1 

All  things  considered,  how  satisfied  are  you  with  your  current  health  plan? 

1 

Would  you  recommend  your  current  health  plan  to  your  family  or  friends  if 
they  needed  care? 

1 

Do  you  intend  to  switch  to  a  different  health  plan  when  you  next  have  an 
opportunity? 

1 

Table  5.12 

Measures  and  Weights  for  Access  and  Service  Scale 

Scale/Item  (from  Member  Satisfaction  Survey) 

Weight 

How  would  you  rate  the  ease  of  making  appointments  by  phone? 

i 

How  would  you  rate  the  length  of  time  you  had  to  wait  between  making  an 
appointment  for  routine  care  and  the  day  of  your  visit? 

i 

How  would  you  rate  the  number  of  doctors  you  had  to  choose  from? 

i 

Have  you  had  a  problem  with  delays  in  your  medical  care  while  you  waited 
for  approval  by  your  health  plan? 

i 

Have  you  had  any  difficulty  in  receiving  care  you  and  your  doctor  believed 
was  necessary? 

i 

Have  you  ever  not  been  able  to  get  a  referral  to  a  specialist  that  you  wanted 
to  see? 

i 

How  many  days  did  you  have  to  wait  between  making  an  appointment  for 
routine  care  and  the  day  you  saw  the  provider? 

i 

How  many  days  did  you  have  to  wait  between  making  an  appointment  for  a 
minor  illness  or  injury  and  the  day  you  saw  the  provider? 

i 

How  many  days  did  you  have  to  wait  between  making  an  appointment  for 
chronic  or  ongoing  care  and  the  day  you  saw  the  provider? 

i 

How  many  days  did  you  have  to  wait  between  making  an  appointment  for 
urgent  care  and  the  day  you  saw  the  provider? 

i 

How  long  did  it  usually  take  your  provider’s  office  to  return  a  call? 

i 

How  long  did  you  have  to  wait  to  see  your  provider  once  you  arrived  at  the 
office  for  an  appointment? 

i 
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Table  5.12 

Measures  and  Weights  for  Access  and  Service  Scale  (cont) 


Scale/Item  (from  Member  Satisfaction  Survey) 

Weight 

MM 

How  often  did  you  see  the  same  provider  over  the  last  12  months? 

1 

How  would  you  rate  the  thoroughness  of  the  treatment  you  received? 

1 

How  would  you  rate  the  attention  given  to  what  you  had  to  say? 

1 

How  would  you  rate  the  ease  of  choosing  a  personal  physician? 

1 

How  would  you  rate  the  amount  of  time  you  had  with  the  doctor? 

1 

How  would  you  rate  the  types  of  services  covered? 

1 

How  would  you  rate  the  availability  of  eligibility  and  coverage  information? 

1 

How  would  you  rate  the  availability  of  cost  information? 

1 

How  would  you  rate  the  time  required  to  fill  out  paperwork? 

1 

Have  you  called  with  a  complaint? 

1 

If  you  called  with  a  complaint,  how  long  did  it  take  to  resolve  it? 

1 

The  measures  for  Consumer  Satisfaction  each  received  equal  weight.  The  measures 
shown  in  Tables  5.11  and  5.12  are  not  written  exactly  as  presented  in  the  survey  instrument,  but 
capture  the  basic  content  of  the  question.  Each  of  the  measures  received  equal  weight  within  the 
Access  and  Service  scale  but  the  Access  subscale  was  given  twice  as  much  weight  as  the  Service 
subscale  in  producing  the  overall  score  for  this  dimension. 
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Table  5.13 

Measures  and  Weights  for  the  Staying  Healthy  Scale 


Item  Weight 

Childhood  immunizations  rate  15 

Adolescent  immunizations  rate  5 

Prenatal  care  initiated  in  the  first  trimester  14 

Checkups  after  delivery  1 

Six  or  more  well  child  visits  (0-15  months)  5 

One  or  more  well  child  visits  (3-6  years)  5 

One  or  more  adolescent  well  care  visits  (12-21  years)  5 

Breast  cancer  screening  rate  30 

Cervical  cancer  screening  rate  20 


Table  5.14 

Measures  and  Weights  for  the  Getting  Better/Living  with  Illness  Scale 


Item  Weight 

Follow-up  after  hospitalization  for  mental  illness  20 

Advising  smokers  to  quit  30 

Eye  exams  for  persons  with  diabetes  20 

Beta  blocker  treatment  after  heart  attack  30 


The  measures  in  the  Staying  Healthy  and  Getting  Better/Living  with  Illness  scales  were 
weighted  to  reflect  the  expected  health  benefits  to  the  population  of  improved  performance  on 
the  measure.  This  information  was  derived  from  a  previous  review  of  the  scientific  literature 
conducted  by  RAND4. 

Results 

The  1997  Quality  Compass  data  were  used  to  illustrate  the  effect  of  weights  on  results. 
There  is  a  small  effect  of  weighting  on  plan  scores.  Tables  5.15-5.18  compare  the  results  for 
plans  with  and  without  weighting.  The  Staying  Healthy/Living  with  Illness  scale  shows  some 
differences  as  a  result  of  weighting.  Note  in  Table  5.19  that  all  of  the  correlations  (weighted  vs. 
unweighted)  are  above  0.92. 


4  McGlynn,  Keeler,  Tseng,  et  al.  Health  Benefits  and  Cost  Effectiveness  of  Attaining  the  Goals  set 
By  the  HEDIS  2.5  Quality  Indicators,  unpublished. 
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Table  5.15 


Consumer  Satisfaction:  Comparison  of  Weights  versus  No 

Weights 


Table  5.16 

Access  and  Service:  Comparison  of  Weights  versus  No  Weights 


Table  5.17 


Staying  Healthy:  Comparison  of  Weights  versus  No  Weights 


Table  5.18 

Getting  Better/Living  with  Illness:  Comparison  of  Weights 
versus  No  Weights 
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Table  5.19 

Correlation  of  Scores  with  Weights  vs.  Without  Weights 


Category 

Correlation  Coefficient 

Consumer  Satisfaction 

1.00 

Access  and  Service 

0.998 

Staying  Healthy 

0.927 

Getting  Better/Living  with  Illness 

0.992 

Policy  Implications  and  Recommendations 

Weighting  items  does  have  a  small  effect  on  the  results.  The  expert  weights  were 
reviewed  by  the  CARS  Steering  Committee  and  approved  with  some  modifications.  The  final 
measures  in  each  scale  and  associated  weights  are  shown  in  Tables  5.11  -  5.14.  Weighting  does 
focus  attention  on  the  importance  of  the  processes  measured  in  each  scale. 

STANDARD  ERROR  OF  THE  SCALE  SCORES 

Measurement  of  health  plan  performance  is  an  imperfect  science;  a  variety  of  errors 
occur  in  measuring  performance.  In  order  to  provide  valid  information  for  users  of  performance 
measures,  we  must  find  methods  to  incorporate  some  of  the  uncertainties  about  the  results  into 
the  reporting. 

Item  level  measurement  standard  errors 

We  had  to  calculate  standard  errors  from  the  available  confidence  intervals  for  items 
when  the  standard  errors  were  not  reported  by  plans.  This  was  accomplished  by  solving  the 
HEDIS  formulas  for  the  sample  size  and  calculating  the  standard  errors.  For  a  few  plans,  we 
obtained  rates  of  performance  but  not  standard  errors  or  sample  sizes.  The  required  sample  sizes 
for  these  plans  was  imported  by  averaging  the  minimum  of  the  plans’  reported  N  or  41 1, 
whichever  was  smaller.  This  was  the  best  available  estimate  of  the  hybrid  method  N.  The 
smaller  N  was  chosen  to  avoid  assigning  a  small  standard  error  when  we  were  uncertain  of  the 
true  sample  size. 

Scale  standard  errors 

The  scales  were  ultimately  the  weighted  sum  of  the  individual  items.  The  weights  and 
the  item  level  standard  errors  were  used  to  calculate  an  overall  standard  error  for  the  scale. 

Standard  error  information  is  typically  used  in  one  of  two  related  ways.  First,  standard 
errors  can  be  used  to  produce  confidence  intervals  around  the  scale  scores.  This  would  be  most 
useful  when  presenting  numeric  scale  scores  or  graphs  that  represent  these  scores.  Second, 
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standard  errors  can  be  used  to  perform  statistical  tests.  This  would  be  most  useful  when  the 
presentation  was  making  some  comparative  claim  like:  “performance  is  above  the  mean.”  There 
is  a  strong  relationship  between  the  two  approaches.  In  the  statistical  star  method  presented 
below  we  will  actually  perform  a  test  by  constructing  a  confidence  interval  and  seeing  if  that 
interval  touches  a  cutpoint. 
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CUTPOINTS 

The  1997  Quality  Compass  data  were  used  to  model  different  choices  of  outpoints, 
comparing  performance  in  the  top  one-third  versus  the  top  one-quarter. 

Within  each  scale,  the  25th  and  75th  percentiles  were  calculated  across  the  distribution  of 
scores  for  all  plans  contracting  with  one  or  more  CARS  Steering  Committee  members.  This 
exercise  was  then  repeated  this  using  the  33rd  and  67th  percentiles.  These  established  the 
comparison  points  that  plans  were  tested  against  to  determine  their  star  rating  in  each  category. 

A  statistical  star  system  was  used  to  incorporate  uncertainty  by  using  the  standard  errors 
of  the  scores.  The  method  using  the  25/75  cutpoints  is  described  here.  The  33/67  scale  was 
produced  in  the  same  way  using  the  different  cutpoints.  If  a  plan's  score  fell  below  the  25th 
percentile  nationally,  then  a  95%  confidence  interval  for  the  plan  was  constructed.  If  this 
confidence  interval  was  entirely  below  the  25th  percentile,  the  plan  was  given  one  star.  If  a 
plan's  score  was  above  the  75th  percentile  nationally,  then  the  plan’s  95%  confidence  interval 
was  also  constructed.  If  this  confidence  interval  was  entirely  above  the  75th  percentile,  the  plan 
received  three  stars.  All  other  plans  received  two  stars.  The  logic  was  to  give  one-  or  three-star 
scores  to  only  those  plans  where  we  were  fairly  certain  that  the  plans  were  in  the  bottom  or  top 
quarters  of  the  national  plan  distribution. 

Results 

Tables  5.20  -  5.23  show  the  relationships  between  the  star  assignments  using  the  two 
different  cutpoints.  As  you  would  expect,  the  33/66  method  assigns  more  plans  to  the  one  and 
three  star  categories. 
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Table  5.20 


Consumer  Satisfaction:  Cutpoint  Comparisons 


Table  5.21 


Staying  Healthy:  Cutpoint  Comparisons 


Table  5.22 


Access  and  Service:  Cutpoint  Comparisons 


Table  5.23 


Getting  Better/Living  with  Illness:  Cutpoint  Comparisons 
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Policy  Implications  and  Recommendations 

The  CARS  Steering  Committee  elected  to  characterize  performance  using  symbols 
(rather  than  numbers  or  graphs)  and  chose  stars  as  the  symbol.  Ford  and  DaimlerChrysler  had 
previously  used  stars  to  characterize  plan  performance;  GM  had  used  triangles.  Plans  were 
assigned  one,  two,  or  three  stars  within  each  reporting  category  based  on  a  statistical  test  (we 
referred  to  this  approach  as  “statistical  stars”).  By  contrast,  the  Ford  1997  system  assigned  stars 
without  a  statistical  test;  the  plan  distributions  were  divided  into  three  equal  sized  groups.  The 
statistical  test  provides  a  more  defensible  method  of  assigning  plans  to  result  groups  because  it 
takes  account  of  the  uncertainty  that  results  from  drawing  samples  to  estimate  performance. 
Because  the  CARS  Steering  Committee  wanted  the  stars  to  clearly  distinguish  among  different 
levels  of  performance  the  more  stringent  outpoints  were  selected  (25th  and  75th  percentiles). 


CHAPTER  6 

REPORTING  STRATEGY 


The  Steering  Committee  also  had  to  agree  on  how  the  results  would  be 
communicated  both  internally  to  benefits  managers  and  externally  to  employees  during 
open  enrollment.  The  Steering  Committee  agreed  that  the  presentation  of  results  should 
be  uniform  across  the  open  enrollment  materials  for  each  of  the  three  automobile 
manufacturers.  This  included  the  order  in  which  categories  were  presented  and  the 
method  of  communicating  scores  (i.e.,  stars). 

Two  major  options  were  considered:  (a)  presenting  numeric  scores  or  (b) 
representing  results  with  symbols.  The  CARS  Steering  Committee  elected  to  use 
symbols  and  chose  stars  as  the  symbol.  The  major  reason  for  this  choice  was  that  all 
three  automobile  companies  had  previously  used  symbols  in  reporting  performance 
results.  There  is  some  evidence  that  consistency  in  reporting  improves  the  likelihood  that 
people  will  use  the  information.  Two  of  the  companies  (Chrysler  and  Ford)  had  used 
stars  previously;  GM  had  used  triangles.  All  three  companies  had  used  three  levels  of 
performance  to  report  results.  None  of  the  companies  had  used  a  statistical  method  for 
determining  the  placement  of  plans  in  result  groups. 

The  1997  Quality  Compass  data  were  used  to  model  different  choices  of 
cutpoints,  comparing  performance  in  the  top  one-third  versus  the  top  one-quarter. 

Because  the  CARS  Steering  Committee  wanted  the  stars  to  clearly  distinguish  among 
different  levels  of  performance  the  more  stringent  cutpoints  were  selected  (25th  and  75th 
percentiles). 

Within  each  scale,  the  25th  and  75th  percentiles  were  calculated  across  the 
distribution  of  scores  for  all  plans  contracting  with  one  or  more  CARS  Steering 
Committee  members.  If  a  plan’s  score  fell  below  the  25th  percentile  nationally,  we 
constructed  a  95%  confidence  interval  for  the  plan.  If  this  confidence  interval  was 
entirely  below  the  25*  percentile,  the  plan  received  one  star.  If  a  plan's  score  was  above 
the  75*  percentile  nationally  we  also  constructed  a  95%  confidence  interval  for  the  plan. 

If  this  confidence  interval  was  entirely  above  the  75*  percentile,  the  plan  received  three 
stars.  All  other  plans  received  two  stars.  The  logic  was  only  to  give  one  or  three  star 
scores  to  plans  where  we  were  fairly  certain  that  the  plans  were  in  the  bottom  or  top 
quarters  of  the  national  plan  distribution.  This  method  was  referred  to  as  “statistical 
stars”  to  reflect  the  use  of  statistical  principles  to  make  assignments  to  result  categories. 
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The  Steering  Committee  valued  the  opportunity  to  incorporate  statistical  uncertainty  into 
the  performance  reports — the  statistical  star  model  facilitated  this  objective. 

The  primary  media  for  transmitting  results  were  printed  brochures  that  were  part 
of  the  open  enrollment  materials.  Employees  received  information  on  those  health  plans 
in  their  geographic  area  which  were  open  for  enrollment.  In  most  areas,  the  performance 
reports  were  not  available  for  the  indemnity  or  traditional  plans.  GM  did  provide  survey 
data  for  scoring  traditional  indemnity  plans  and  preferred  provider  organizations  in  three 
states  (Michigan,  Ohio,  Indiana)  on  two  of  the  performance  categories  (Consumer 
Satisfaction  and  Access  and  Service). 

In  addition  to  open  enrollment  materials,  information  was  made  available  to 
employees  on  internal  Web  sites.  The  Greater  Detroit  Area  Health  Council  also  reported 
the  results  on  its  Web  page  for  plans  operating  in  Southeast  Michigan 
(http://www.hiag.org/qat/hmoguide.htm). 

A  sample  report  card  is  shown  in  Figure  6.1. 


Sample  CARS  '98  Report 


Figure  6.1 — Sample  Report  Card 


CHAPTER  7 

OBTAINING  DATA  FROM  HEALTH  PLANS 


The  three  data  sources  for  the  CARS  model  are  HEDIS  3.0/1998,  the  NCQA 
Membership  Satisfaction  Survey  (MSS),  and  NCQA  Accreditation  Status.  The  first  two 
data  sources  are  available  from  NCQA’s  Quality  Compass.  NCQA's  Quality  Compass  is 
a  national  CD-ROM  database  of  plan  specific  performance  information  including  nearly 
300  health  plans.  However,  in  consultation  with  NCQA,  RAND  and  the  Steering 
Committee  concluded  that  a  direct  data  feed  from  NCQA  would  not  be  available  in  time 
to  produce  scores  on  the  schedule  required  by  GM  and  DaimlerChrysler  for  their  open 
enrollment  materials.  Further,  NCQA  would  only  be  able  to  provide  data  for  the  health 
plans  that  reported  to  Quality  Compass  for  public  release.4  Of  the  131  unique  managed 
care  plans  with  which  DaimlerChrysler,  Ford  and  GM  contract,  76  plans  publicly 
reported  their  HEDIS  3.0/1998  performance  to  Quality  Compass.  Due  to  the  time 
constraints  as  well  as  the  fact  that  not  all  plans  report  to  Quality  Compass  for  public 
release,  the  data  were  obtained  directly  from  the  health  plans. 

PROCESS  FOR  OBTAINING  DATA 

A  combined  list  of  plans  that  contracted  with  one  or  more  of  the  companies  was 
developed  to  coordinate  the  data  requests.  Table  7.1  shows  the  number  of  plans  each 
company  contracted  with  and  the  amount  of  overlap  in  the  contracts.  All  plans  were 
contacted  in  writing  during  May  1998  and  informed  that  June  15, 1998  was  the  deadline 
to  submit  HEDIS  and  MSS  data.  GM  took  the  lead  on  requesting  data  from  the  health 
plans  with  which  it  held  contracts,  many  of  which  also  held  contracts  with  Ford  and/or 
DaimlerChrysler  (see  Table  7.1).  Representatives  from  Ford  and  DaimlerChrysler 
contacted  the  remaining  health  plans. 


4  Health  plans  have  the  option  to  report  their  performance  to  NCQA  for  use  in  calculating 
national  average  performance  without  allowing  their  data  to  be  available  for  public  release  to  consumers, 
purchasers,  or  researchers. 
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Table  7.1 


Number  of  Health  Plan  Contracts  by  Company  and  Degree  of  Overlap 


Contract  Description 

NUMBER  OF  PLANS* 

Chrysler  (all  plans) 

24 

Ford  (all  plans) 

73 

GM  (all  plans) 

108 

Plans  that  contract  with  all  three 

14 

Plans  that  contract  with  two 

46 

*  Includes  some  plans  that  were  being  considered  for  inclusion  for  the  first  time;  in  the  case  of  multiple 
contracts  with  a  plan,  the  plan  is  only  counted  once. 


NCQA  allowed  the  plans  that  intended  to  report  to  Quality  Compass  to  send 
RAND  copies  of  the  HEDIS  3.0/1998  Data  Submission  Tool  (DST)  for  use  in  the  CARS 
project.  The  DST  is  an  electronic  file  and  was  submitted  by  the  plans  to  RAND  either  on 
disk  or  via  electronic  mail.  Plans  not  reporting  to  Quality  Compass  could  not  legally  use 
the  DST,  so  a  separate  survey  for  the  CARS  project  was  developed  to  obtain  the  relevant 
HEDIS  and  MSS  results.  The  CARS  data  collection  survey  was  available  to  the  health 
plans  in  both  an  electronic  and  paper  form.  Representatives  from  the  health  plans  were 
able  to  submit  the  completed  survey  via  fax  or  electronic  mail. 

The  MSS  data  were  not  included  in  the  DST.  The  standard  format  in  which  the 
MSS  data  were  received  was  a  text  file  that  met  NCQA  specifications  to  report  plan-level 
MSS  results.  If  plans  did  not  have  the  MSS  text  file  available  for  submission,  they  were 
able  to  use  the  CARS  data  collection  survey.  Table  7.2  shows  the  number  of  plans  that 
submitted  HEDIS  and  MSS  data  in  the  various  formats. 

Table  7.2 


Number  of  Health  Plans  by  Submission  Format 


Number  of  Plans 
by  Type  of  Data 

Data  Submission  Format 

HEDIS 

MSS 

Preferred  format  (e.g.,  DST  and  text  file) 

80 

43 

CARS  survey 

33 

49 

Member  level  data 

— 

3 

Hard-copy 

15 

31 

No  data  submitted 

3 

5 

TOTAL 

131 

131 

54 


HEDIS  data  were  received  for  128  health  plans.  Of  the  48  health  plans  that  did 
not  submit  HEDIS  data  via  the  DST,  33  plans  used  the  CARS  survey.  The  remaining 
plans  submitted  a  hard-copy  summary  of  their  HEDIS  performance  results.  A  total  of 
126  plans  submitted  MSS  data.  Of  the  83  plans  that  did  not  submit  the  standard  text  file, 
49  plans  used  the  CARS  data  collection  survey.  Three  plans  submitted  member  level 
data  files  that  required  RAND  to  generate  the  summary  statistics  used  in  the  aggregation 
method.  The  remaining  plans  submitted  a  hard-copy  summary  of  their  MSS  results. 

NCQA  accreditation  status  is  available  on  NCQA’s  Web  page  (www.ncqa.org). 
The  Accreditation  Status  List  (ASL)  is  updated  by  NCQA  on  a  monthly  basis. 
Consequently,  there  is  a  potential  lag  between  a  change  in  accreditation  status  and  when 
that  information  is  available  through  NCQA’s  Web  page.  The  results  used  by  the  auto 
manufacturers  in  their  open  enrollment  materials  were  based  on  information  on  the  Web 
as  of  August  3, 1998. 

DATA  VERIFICATION 

The  initial  requests  for  data  were  sent  to  the  health  plans  to  coincide  with  the  date 
that  initial  Quality  Compass  reports  were  due  to  NCQA.  Because  NCQA  engages  in  a 
process  of  reviewing  results  and  correcting  errors  in  the  initial  data  submissions,  RAND 
chose  to  mirror  this  process.  RAND  developed  a  form  to  feedback  the  raw  inputs  used  in 
the  development  of  the  performance  scores  (Appendix  A).  The  verification  reports  were 
sent  to  all  plans  submitting  data  to  ensure  that  the  information  was  correct.  In  addition, 
five  days  before  the  deadline  for  data  submission  all  plans  received  an  information  sheet 
explaining  the  consequences  of  not  reporting  (i.e.,  imputation  of  the  lowest  reported 
value  for  missing  data).  Missing  data  were  highlighted  and  the  plans  had  a  final 
opportunity  to  submit  data.  Table  7.3  shows  the  data  collection  timeline. 
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Table  7.3 

Data  Collection  Timeline 


Data  Received 
(%  of  plans) 

Date 

Milestone 

HEDIS 

MSS 

5/22/98 

Data  request  memo  to  plans 

— 

— 

6/15/98 

Submissions  due  to  NCQA  and 

RAND 

0 

0 

7/98  - 
8/98 

RAND  sends  data  verification  reports 

— 

— 

7/17/98 

Corrected  submissions  due  to  NCQA 
and  RAND 

43 

37 

7/27/98 

RAND  memo  to  plans  stating  final 
deadline  of  August  1st 

62 

55 

8/03/98 

FINAL  deadline  for  data  submissions 

92 

79 

8/14/98 

Cutpoints  determined 

95 

94 

SUMMARY  OF  DATA  COMPLETENESS 

Twelve  HEDIS  measures  were  included  in  the  CARS  model.  For  those  plans  that 
submitted  HEDIS  data,  there  was  an  average  of  one  and  a  maximum  of  seven  missing 
measures.  The  number  of  missing  measures  did  vary  by  data  submission  format.  Plans 
that  used  the  DST  were  missing  less  than  one  measure  on  average,  while  plans  that  used 
alternative  submission  formats  were  missing  1.8  measures  on  average.  Table  7.4  shows 
the  completeness  of  reporting  for  each  HEDIS  measure. 
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Table  7.4 

Percent  of  Plans  Reporting  HEDIS  Measures  by  Submission  Type 


Percent  of  Plans  Reporting  a  HEDIS  Measure 
by  Submission  Type 

Measure 

DST 

Other  submission 
format 

Total 

Childhood  immunization  rate 

93 

94 

93 

Adolescent  immunization  rate 

84 

65 

77 

Prenatal  care  -  first  trimester 

98 

100 

99 

Check-ups  after  delivery 

96 

96 

96 

Well-child  visits  in  first  15  months  of 
life 

96 

75 

89 

Well-child  visits  in  3-6*  year  of  life 

96 

90 

94 

Adolescent  well-care  visits 

95 

92 

94 

Breast  cancer  screening 

100 

100 

100 

Cervical  cancer  screening 

100 

100 

100 

Eye  exams  for  diabetics 

96 

96 

96 

Follow-up  after  mental  health 
hospitalization 

95 

56 

81 

Beta  blocker  treatment  after  a  heart 
attack 

81 

58 

73 

The  CARS  model  included  29  items  from  the  MSS.  On  average,  plans  were 
missing  less  than  one  MSS  measure.  The  maximum  number  of  missing  MSS  measures 
was  27;  this  came  from  a  plan  that  submitted  the  results  of  only  two  global  assessment 
questions.  The  plan  either  did  not  field  the  entire  MSS  or  failed  to  submit  the  results  of 
the  entire  survey.  None  of  the  plans  that  used  the  standard  text  file  data  submission 
format  were  missing  any  of  the  MSS  items.  One  of  the  performance  measures,  advice  to 
quit  smoking,  is  derived  from  the  MSS. 


CHAPTER  8 
RESULTS 


The  three  auto  manufacturers  spend  approximately  $8  billion  annually  to  provide 
health  insurance  for  over  2  million  people,  and  contract  with  131  managed  care  in  34 
different  states.  Thus,  the  potential  effect  of  this  national-level  effort  to  assess  health 
plan  performance  is  quite  large.  In  several  Midwestern  states,  these  companies  represent 
among  the  largest  employers  operating  and  thus  are  a  significant  portion  of  the  business 
for  health  plans  delivering  services  in  these  markets. 

Results  of  the  health  plan  performance  evaluation  were  published  in  1999  open 
enrollment  materials  for  the  employees  of  DaimlerChrysler,  Ford,  and  GM.  The  results 
for  Southeastern  Michigan  were  also  made  available  on  GDAHC’s  Web  page 
(http://www.hiag.com). 

Table  8.1  shows  the  distribution  of  stars  across  the  plans  by  performance 
category.  For  each  reporting  category,  at  least  one-half  of  the  health  plans  received  two 
stars.  There  was  a  range  of  16  to  21  percent  of  the  plans  performing  below  the  25th 
percentile  for  each  of  the  reporting  categories;  these  plans  received  one  star.  For  each 
reporting  category,  14  to  20  percent  of  the  plans  received  three  stars  because  they  clearly 
performed  above  the  75th  percentile. 


Table  8.1 

Distribution  of  Plan  Results  by  Performance  Category 


Category 

Percent  of 
plans  with  one 
star 

Percent  of 
plans  with  two 
stars 

Percent  of 
plans  with 
three  stars 

Consumer  Satisfaction 

16 

70 

14 

Access  and  Service 

22 

58 

20 

Staying  Healthy 

16 

69 

15 

Getting  Better/Living  with 
Illness 

21 

59 

20 

Due  to  the  minimum  imputation  rule  for  missing  values,  a  plan’s  performance 
within  a  category  was  diminished  if  component  items  were  not  reported.  Table  8.2  shows 
the  average  number  of  missing  items  for  each  performance  category  by  the  number  of 
stars  that  the  plan  received.  There  was  one  plan  that  received  three  stars  in  the  Staying 
Healthy/Living  with  Illness  category  even  though  it  was  missing  one  component  of  the 
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reporting  category;  no  other  plans  with  missing  data  received  three  stars  in  the  category 
where  the  data  were  missing. 


Table  8.2 

Average  Number  of  Missing  Items  by  Score  in  each  Performance  Category 


Average  Number  of  Missing 
Items  by  Score 

Total  Number  of 

Category 

One  star 

Two 

stars 

Three 

stars 

Items  in  Category 

Consumer  Satisfaction 

1.67 

0 

0 

5 

Access  and  Service 

5.9 

0.1 

0 

23 

Staying  Healthy 

3.9 

0.5 

0.1 

9 

Getting  Better/Living  with 
Illness 

2.3 

0.5 

0 

4 

Plans  showed  differential  performance  across  the  four  reporting  categories.  Table 
8.3  shows  the  percent  of  plans  that  received  the  same  score  in  each  category  as  well  as 
the  percent  of  plans  that  received  a  one  star  or  three  star  score  in  at  least  one  category. 
Five  percent  of  the  plans  received  one  star  in  at  least  one  category  and  three  stars  in 
another  category.  Four  percent  of  the  plans  received  one  star  in  every  category,  25 
percent  of  the  plans  received  two  stars  in  all  four  reporting  categories,  and  three  percent 
of  the  plans  received  three  stars  in  every  category. 


Table  8.3 

Percent  of  Plans  by  Score 


Score 

Percent  of  plans 

One  star  in  every  category 

3.8 

Two  stars  in  every  category 

25.2 

Three  stars  in  every  category 

3.1 

One  star  in  at  least  one  category 

42.1 

Three  stars  in  at  least  one  category 

36.9 

Table  8.4  gives  the  results  of  CARS  ’98  by  health  plan  name.  NCQA 
Accreditation  is  shown  as  1  for  accredited  and  0  for  not  accredited.  While  NCQA  has 
many  more  categories  of  accreditation,  the  group  elected  this  dichotomous  approach  to 
simplify  reporting.  The  next  four  columns  give  the  actual  score  for  each  plan  by 
category.  These  scores  reflect  the  performance  result  times  the  weight  and  are  not 
directly  interpretable.  The  last  four  columns  give  the  number  of  stars  that  were  assigned 
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based  on  the  score,  the  confidence  interval  around  that  score  and  its  relationship  to  the 
selected  cutpoints. 


Table  8.4 

Coordinated  Autos/UAW  Reporting  System 
(CARS  ’98)  FINAL  RESULTS 
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Health  Plan  Name 

Aetna  U.S.  Healthcare  -  Dallas/Fort  Worth  Metroplex 

Aetna  U.S.  Healthcare  -  Eastem/Central  Pennsylvania 

Aetna  U.S.  Healthcare  -  Greater  Houston  Area 

Aetna  U.S.  Healthcare  -  New  York 

Aetna  U.S.  Healthcare  -  Northern  New  Jersey 

Aetna  U.S.  Healthcare  -  Southern  New  Jersey/Delaware 

Anthem  BCBS  of  OH  (Health  Maintenance  Plan) 

Anthem  PMP(OhioPPO) 

Anthem  PPN  (IN  PPO) 

AvMed  Health  Plan  -  Florida 

BCBS  Michigan  -  PPO 

BCBS  Michigan  -  Traditional 

Blue  Care  Health  Plan 

Blue  Care  Network,  East  Michigan  Region 

Blue  Care  Network,  Mid  Michigan  Region 

Blue  Care  Network,  Southeast  Michigan  Region 

Blue  Care  Network,  West  Michigan  Region 

Blue  Choice  (Arizona) 

Blue  Cross  Blue  Shield  of  Kansas  City 

Blue  Plus  (Minnesota) 

BlueChoice  Healthcare  Plan 

BlueLincs  HMO 

CIGNA  Healthcare  of  Arizona  -  Tucson 

CIGNA  Healthcare  of  Florida 

CIGNA  Healthcare  of  Georgia 

CIGNA  Healthcare  of  Louisiana 

CIGNA  Healthcare  of  North  Texas  -  Dallas 

CIGNA  Healthcare  of  OH/Cincinnati 

CIGNA  Healthcare  of  Ohio  -  Columbus 

CIGNA  Healthcare  of  South  Texas  -  Houston 

CIGNA  Healthcare  of  Tennessee 

Capital  District  Physicians  Health  Plan 

Care  Choices  HMO/Mercy  Health  Plans 

Care  Choices  of  Iowa 

Community  Health  Plan 

Compcare  -  Wisconsin 

ConnectiCare,  Inc. 

Dean  Health  Plan,  Inc. 

Exclusive  Healthcare,  Inc.  -  Omaha 

Fallon  Community  Health  Plan 

Family  Health  Plan  Cooperative 

Finger  Lakes  Health  Insurance  Co. 

[Freestate  HP/CareFirst  (includes  Columbia  Medical  Plan) 
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Table  8.4 

Coordinated  Autos/UAW  Reporting  System 
(CARS  *98)  FINAL  RESULTS 
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Health  Plan  Name 

Humana  Tampa 

Indiana  Anthem  BCBS  (Traditional) 

Intergroup  of  Arizona 

Kaiser  Foundation  Health  Plan  of  Colorado 

Kaiser  Foundation  Health  Plan  of  Georgia,  Inc. 

Kaiser  Foundation  Health  Plan  of  Kansas  City,  Inc. 

Kaiser  Foundation  Health  Plan  of  North  Carolina 

Kaiser  Foundation  Health  Plan  of  Ohio 

|  Kaiser  Foundation  Health  Plan  of  So.  California 

Kaiser  Foundation  Health  Plan  of  Texas 

Kaiser  Foundation  Health  Plan  of  the  Mid- Atlantic  States 

Kaiser  Foundation  Health  Plan  of  the  Northwest 

Kaiser  Foundation  Health  Plan,  Inc.  Hawaii  Region 

Kaiser  Foundation  Health  Plan,  Inc.  No.  California 

Kaiser  Foundation  Health  Plan,  Northeast  Region 

Lovelace 

M  Plan 

M-Care 

Maxicare  Indiana,  Inc. 

Medical  Value  Plan 

Mercycare  Health  Plans 

NYLCare  of  the  Mid- Atlantic 

Ochsner  Health  Plan 

Ohio  Anthem  BCBS  (Traditional) 

Omnicare 

Optima  Health  Plan 

PARTNERS  National  Health  Plans  of  Indiana 

PHP-NY 

PacifiCare  of  California 

PacifiCare  of  Nevada 

PacifiCare  of  Oklahoma 

PacifiCare  of  Oregon 

PacifiCare  of  Texas 

Paramount  Health  Care 

Physicians  Health  Plan  of  Mid  Michigan,  Inc. 

Physicians  Health  Plan  of  South  Michigan,  Inc. 

Physicians  Health  Plan  of  Southwest  MI,  Inc. 

Physicians  Health  Plan  of  West  Michigan,  Inc. 

Preferred  Care,  Inc. 

PnmeCare  Select 

Priority  Health  -  Michigan 

Prudential  Healthcare  -  Florida  (Jacksonville) 

Prudential  Healthcare  -  Missouri  (St.  Louis) 

Table  8.4 

Coordinated  Autos/UAW  Reporting  System 
(CARS  '98)  FINAL  RESULTS 
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CHAPTER  9 

DISCUSSION  AND  NEXT  STEPS 


The  purpose  of  this  report  is  to  document  in  detail  the  process,  considerations,  and 
decisions  that  were  made  in  developing  a  coordinated  strategy  for  reporting  on  health 
plan  performance.  The  intent  of  the  CARS  Steering  Committee  from  the  beginning  was 
to  place  these  methods  in  the  public  domain  so  that  they  would  be  accessible  to  others 
interested  in  pursuing  a  similar  strategy  for  reporting. 

IMPORTANCE  OF  A  COORDINATED  STRATEGY 

Just  a  decade  ago,  no  systematic  information  was  routinely  available  on  the 
quality  of  health  services  being  delivered  across  the  country.  With  the  introduction  of 
HEDIS,  this  has  begun  to  change.  We  have  rapidly  moved  from  not  having  enough 
information  to  having  too  much  to  make  use  of  in  decision  making.  Thus,  strategies  for 
summarizing  results  have  become  of  great  interest.  The  lay  press  in  particular  have 
undertaken  efforts  to  provide  simple  summary  scores  on  health  plan  performance  using 
available  information.  Unfortunately,  differences  in  the  methods  by  which  summary 
scores  are  derived  can  lead  to  differences  in  the  conclusions  that  are  drawn.  When 
consumers  are  faced  with  conflicting  information  in  an  area  which  has  previously  been 
characterized  by  little  information,  the  most  natural  response  is  to  ignore  the  new 
information.  This  inhibits  rather  than  facilitates  promoting  the  use  of  performance  data 
for  decision  making.  Thus,  finding  common  methods  for  drawing  summary  conclusions 
from  a  broad  array  of  data  is  essential  for  increasing  the  likelihood  that  such  information 
will  be  used  in  decision  making. 

UTILITY  OF  REPORTS  FOR  CONSUMERS 

How  useful  was  the  current  report  card  for  consumers?  That  question  is  currently 
being  evaluated  at  the  three  automobile  manufacturers  and  remains  to  be  answered.  But 
it  is  worth  noting  that  the  process  of  changing  the  way  people  evaluate  health  plans  and 
make  decisions  may  require  several  years  and  multifaceted  educational  efforts. 

Evaluation  of  the  current  report  card  is  critical,  however,  for  identifying  potential 
improvements  in  subsequent  years.  On  the  other  hand,  consistency  over  time  may  be 
important  for  ensuring  that  the  reports  get  used. 

A  recent  evaluation  conducted  for  NCQA  on  the  utility  of  report  cards  for 
consumers  (NCQA  and  IRE,  1998)  reached  a  number  of  conclusions  that  are  worth 
bearing  in  mind  when  evaluating  the  current  CARS  efforts. 
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•  People  who  are  most  likely  to  remember  seeing  report  cards  are  those  who: 

-  tend  to  use  written  comparative  information  in  making  decisions; 

-  have  an  interest  in  health  and  health  care; 

-  are  considering  changing  health  plans. 

•  The  people  who  find  report  cards  useful  are: 

-  women  (who  often  make  health  care  decisions  in  households); 

-  those  looking  to  confirm  that  their  current  choice  is  good; 

-  those  considering  changing  plans; 

-  those  with  limited  experience  with  the  health  sector; 

-  those  with  chronic  diseases. 

•  Most  report  cards  have  not  been  shown  to  have  a  large  impact  on  consumer 

decision  making,  but: 

-  a  small  number  of  opinion  leaders  using  information  may  be  all  that  is 
required  to  shape  market  response; 

-  choosing  among  health  plans  is  not  a  salient  activity  for  everyone,  so  those 
most  motivated  or  in  need  of  information  may  represent  the  target  market 
(rather  than  everyone  in  an  eligible  population). 

•  Report  cards  most  likely  to  be  useful  when: 

-  real  differences  among  health  plans  exist; 

-  choices  have  consequences; 

-  the  content  is  perceived  to  be  relevant; 

-  the  unit  of  analysis  is  meaningful; 

-  dissemination  strategies  match  needs. 

These  points  are  certainly  worth  keeping  in  mind  as  the  collaborative  efforts 
among  the  automobile  manufacturers  move  forward.  And  all  of  these  conclusions 
suggest  that  a  long  term,  rather  than  short  term  view  is  essential. 

HOW  USEFUL  IS  THIS  REPORT  FOR  OTHER  PURCHASERS? 

Are  there  attributes  of  the  automobile  manufacturers,  their  employees,  or  their 
geographic  location  that  make  the  current  effort  unique  and  not  likely  to  be  duplicated 
with  other  employers?  A  fair  answer  is  probably  yes  and  no. 

For  a  variety  of  reasons,  the  automobile  manufacturers  and  the  UAW  have  taken  a 
strong  leadership  position  in  developing  information  about  quality.  Some  of  this  likely 
stems  from  the  tradition  of  quality  improvement  in  the  manufacturing  process.  Many  of 
the  techniques  that  are  now  being  used  in  health  care  systems  to  improve  care  delivery 
had  their  beginnings  in  manufacturing  processes.  The  role  of  the  UAW  cannot  be 
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understated.  The  union  represents  a  large  labor  force  and  to  conduct  collective 
bargaining  for  its  members  in  an  effective  way,  there  must  be  some  consistency  to  the 
methods  by  which  different  benefit  choices  are  evaluated.  Thus,  the  pressure  to  find 
common  ground  is  reinforced  by  the  UAW  which  requires  such  information  to  facilitate 
its  job.  The  concentration  of  the  industry  in  a  few  states  in  the  Midwest  also  places  the 
automobile  manufacturers  in  a  unique  role  in  their  communities — as  some  of  the  largest 
employers  they  have  a  significant  impact  on  quality  of  life  in  the  region.  Service  to  the 
community  is  an  important  part  of  the  tradition  of  corporate  responsibility.  Finally,  the 
length  of  tenure  of  many  employees  with  these  companies  is  quite  long.  Thus,  the 
automobile  companies  are  likely  to  reap  the  gains  of  improved  quality  over  the  long 
run — because  the  workforce  will  be  healthier  and  more  productive.  Quality  in  this  case 
makes  good  business  sense. 

Does  this  mean  that  other  employers  will  not  find  this  approach  useful? 
Absolutely  not.  Everyone  stands  to  gain  from  consistent  demands  for  information  about 
quality  and  expectations  that  over  time  quality  will  improve.  This  may  be  most  likely  to 
occur  if  common  methods  are  used  to  make  judgments — so  that  there  is  consistency  in 
determining  who  is  performing  well  and  who  requires  help  improving.  It  is  also 
important  to  note  that  health  plans  might  well  serve  some  populations  better  than  others 
and  that  this  may  make  some  plans  have  higher  value  for  some  employers  than  others. 
Information  that  both  allows  consumers  to  find  a  plan  that  meets  their  needs  and  that 
facilitates  purchasers  setting  premiums  that  offer  a  fair  price  is  essential  to  making  this 
scenario  work. 

WHAT  ARE  THE  LIMITATIONS  OF  THE  CURRENT  WORK? 

One  of  the  critical  limitations  of  this  work  is  that  the  evaluations  are  applied  to 
just  one  sector  of  the  health  delivery  system — managed  care  plans.  In  fact,  two-thirds  of 
the  automobile  manufacturers  beneficiaries  are  currently  covered  by  traditional  indemnity 
insurance  (TRAD)  or  preferred  provider  organizations  (PPO).  Failing  to  provide 
information  about  the  performance  of  the  TRAD  sector  creates  several  problems.  First, 
this  may  send  a  signal  that  performance  in  the  traditional  sector  is  optimal  and  requires 
no  monitoring  to  identify  opportunities  for  improvement.  This  conclusion  is  inconsistent 
with  a  body  of  research  literature  that  finds  few  differences  between  managed  care  and 
fee-for-service  delivery.  Second,  monitoring  only  the  managed  care  sector  may  impose 
unfair  costs  of  doing  business  on  that  sector  since  plans  must  spend  money  to  collect  data 
on  quality  but  the  indemnity  sector  is  not  required  to  do  so.  Third,  it  may  be  confusing  to 
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consumers  to  have  information  on  a  subset  of  their  health  insurance  choices.  This  may 
cause  people  to  disregard  the  information  on  quality  or  misinterpret  it. 

The  information  presented  in  this  report  is  based  on  the  data  that  are  currently 
publicly  available.  Certainly  everyone  agrees  that  there  are  a  number  of  areas  in  which 
more  information  is  required.  For  example,  there  are  considerably  fewer  measures 
available  on  the  quality  of  care  for  acute  conditions  or  management  of  chronic  illnesses 
than  there  are  for  preventive  services.  Much  more  information  is  available  on  the  quality 
of  care  for  women  than  men,  and  adults  than  children.  NCQA  and  others  are  working  to 
fill  these  gaps,  but  this  will  take  time.  In  the  meantime,  we  may  have  to  make  do  with  a 
less  than  perfect  set  of  information  for  some  important  areas. 

More  sophisticated  research  methods  could  have  been  used  for  some  of  the  key 
decisions.  For  example,  we  could  have  used  regression  imputation  methods  to  more 
closely  estimate  actual  performance  of  plans  with  missing  data  on  certain  measures. 

Other  approaches  to  weighting  individual  measures  or  to  combining  measures  into 
overall  scores  are  possible.  While  it  is  certainly  worth  exploring  whether  these  methods 
would  produce  different  results,  there  are  challenges  in  determining  what  represents  the 
best  method.  We  have  no  real  “gold  standard”  against  which  summary  scores  can  be 
compared  to  determine  their  predictive  value.  The  principal  method  available  today  is 
face  validity  (does  the  method  make  sense,  do  the  “right”  plans  seem  to  do  well  or  poorly 
on  the  final  scores).  But  this  is  somewhat  circular — it  presupposes  that  we  already  know 
who  the  winners  and  losers  are  and  that  if  the  facts  don’t  conform  to  those 
presuppositions  then  the  facts  or  methods  are  wrong.  So,  continued  evaluation  of  the 
methods  is  important  but  difficult. 

WHAT  ARE  THE  NEXT  STEPS? 

At  the  beginning  of  the  project,  it  was  agreed  that  this  process  was  an  iterative  one 
that  would  take  place  over  several  years.  Establishing  a  quality  improvement  cycle  that 
allows  each  subsequent  iteration  to  address  problems  encountered  in  the  last  iteration 
should  facilitate  development  of  a  system  that  optimizes  the  use  of  available  information 
and  that  perhaps  enhances  the  likelihood  that  more  information  will  be  made  available. 

We  have  proposed  in  the  coming  year  that  some  key  issues  be 
addressed: 

•  Incorporating  reporting  on  TRAD 

•  Investigating  the  contribution  of  stronger  scientific  methods 

•  Increasing  the  number  of  scoring  categories 
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•  Obtaining  feedback  on  the  utility  of  current  reports  (and  suggestions  for  how 
reports  could  be  made  more  useful) 

CARS  ’98  represented  a  substantial  contribution  to  making  data  useful  for 
decision  making.  The  collaborative  nature  of  the  project — bringing  together  competitive 
employers,  union  representatives,  public  purchasers,  business  groups  and  researchers  to 
produce  consensus  and  implement  the  decisions  in  a  four  month  time  frame  is  quite 
remarkable.  It  is  certainly  well  worth  continuing  to  build  on  this  foundation. 


APPENDIX  A 

SAMPLE  VERIFICATION  REPORT 
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Health  Plan  Name  Health 

Plan  A 

NCQA  Organization  ID  XXX 

Childhood  Immunization  Rate  (Combo  1)  71.29 

Adolescent  Immunization  Rate  (Combo  1)  8.76 

Prenatal  Care  in  the  1st  Trimester  71 .76 

Breast  Cancer  Screening  77.1 8 

Cervical  Cancer  Screening  82.35 

Check-up  After  Delivery  70.22 

Well-child  visits:  0-1 5  months  (0  visits)  4.64 

Well-child  visits:  3-5  years  55.13 

Adolescent  Well-Care  Visits  33.25 

Follow-Up  for  MH  Hospitalization  7 1 .57 

Eye  Exams  for  Diabetics  35.87 

Beta-Blocker  Treatment  after  Heart  Attack 


Health  Plan  Name 

NCQA  Organization  ID 

7a.  Ease  of  making  appointments  for  medical 
care  by  phone 


7b.  Length  of  time  you  had  to  wait  between 
making  an  appointment  for  routine  care  and 
the  day  of  your  visit 


7c.  Thoroughness  of  treatment  you  received 


Health 
Plan  A 


XXX 


no  experience 

16 

poor 

33 

fair 

75 

good 

199 

very  good 

219 

excellent 

184 

missing 

7 

no  experience 

20 

poor 

67 

fair 

109 

good 

220 

very  good 

1 

98 

excellent 

107 

missing 

12 

no  experience 

17 

poor 

20 

fair 

68 

good 

195 

very  good 

261 

excellent 

162 

missing 

10 

no  experience 

18 

poor 

20 

fair 

82 

7d.  Attention  given  to  what  you  had  to  say 


73 


7e.  Number  of  doctors  you  had  to  choose 
from 


7f.  Ease  of  choosing  a  personal  physician 


7g.  Amount  of  time  you  had  with  doctors  and 
staff  during  a  visit 


7h.  How  much  you  were  helped  by  the  care 
you  received 


7i.  Overall  quality  of  care  and  services 


8a.  Types  of  services  the  plan  covered 


8b.  Availability  of  information  from  your  plan 
about  eligibility;  covered  services  or 
administrative  issues 


good 

190 

very  good 

239 

excellent 

177 

missing 

7 

no  experience 

76 

poor 

45 

fair 

83 

good 

213 

very  good 

189 

excellent 

102 

missing 

25 

no  experience 

52 

poor 

37 

fair 

85 

good 

216 

very  good 

188 

excellent 

132 

missing 

23 

no  experience 

22 

poor 

42 

fair 

90 

good 

226 

very  good 

230 

excellent 

114 

missing 

9 

no  experience 

22 

poor 

20 

fair 

77 

good 

228 

very  good 

250 

excellent 

128 

missing 

8 

no  experience 

18 

poor 

13 

fair 

69 

good 

220 

very  good 

272 

excellent 

136 

missing 

5 

no  experience 

38 

poor 

24 

fair 

79 

good 

234 

very  good 

228 

excellent 

116 

missing 

14 

no  experience 

55 

poor 

35 

fair 

81 

good 

257 

very  good 

186 

excellent 

98 
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8c.  Availability  of  information  from  your 
doctors  or  plan  about  costs  of  care 


8d.  Length  of  time  you  had  to  spend  filling 
out  claim  forms  or  other  paperwork 


9a.  Delays  in  your  medical  care  while  you 
waited  for  approval  by  your  health  plan 


9b.  Difficulty  in  receiving  care  you  and  your 
doctor  believed  was  necessary 


9c.  Not  being  able  to  get  a  referral  to  a 
specialist  that  you  wanted  to  see 


10.  Have  you  called  or  written  your  health 
plan  with  a  complaint  or  problem  in  the  last  12 
months? 


10a.  How  long  did  it  take  for  the  health  plan 
to  resolve  your  complaint? 


11.  All  things  considered,  how  satisfied  are 
you  with  your  current  health  plan? 


missing 

21 

no  experience 

101 

poor 

43 

fair 

91 

good 

220 

very  good 

171 

excellent 

81 

missing 

26 

no  experience 

136 

poor 

14 

fair 

37 

good 

138 

very  good 

188 

excellent 

205 

missing 

15 

big  problem 

36 

small  problem 

79 

not  a  problem 

585 

missing 

33 

big  problem 

42 

small  problem 

72 

not  a  problem 

585 

missing 

34 

big  problem 

71 

small  problem 

81 

not  a  problem 

538 

missing 

43 

yes 

126 

no 

588 

missing 

19 

same  day 

37 

1  week 

14 

2  weeks 

6 

3  weeks 

3 

4  weeks 

24 

not  yet 

40 

missing 

2 

completely  satisfied 

141 

very  satisfied 

317 

somewhat  satisfied 

139 

neither  satisfied  nor 
dissatisfied 

50 

somewhat  dissatisfied 

42 

very  dissatisfied 

19 

completely  dissatisfied 

8 

missing 

17 

75 


13.  Would  you  recommend  your  current 
health  plan  to  your  family  or  friends  if  they 
needed  care? 


14.  Do  you  intend  to  switch  to  a  different 
health  plan  when  you  next  have  an 
opportunity? 


15a.  Routine  care 


1 5b.  Minor  illness  or  injury 


1 5c.  Chronic  or  ongoing  condition 


15d.  Urgent  care 


16.  Over  the  last  12  months,  when  calling  for 
medical  information  or  advice,  how  long  did  it 
usually  take  for  your  provider’s  offices  to 
return  your  call? 


definitely  yes 

218 

probably  yes 

401 

probably  not 

88 

definitely  not 

14 

missing 

12 

definitely  yes 

22 

probably  yes 

62 

probably  not 

406 

definitely  not 

218 

missing 

25 

same  day 

19 

1-3  days 

112 

4-7  days 

121 

8-14  days 

110 

15-30  days 

118 

31-60  days 

94 

61  days  or  longer 

72 

no  experience 

63 

missing 

24 

same  day 

166 

1-3  days 

320 

4-7  days 

61 

8-14  days 

20 

15-30  days 

7 

31-60  days 

0 

61  days  or  longer 

0 

no  experience 

127 

missing 

32 

same  day 

101 

1-3  days 

165 

4-7  days 

84 

8-14  days 

32 

15-30  days 

21 

31-60  days 

14 

61  days  or  longer 

5 

no  experience 

256 

missing 

55 

same  day 

341 

1-3  days 

37 

4-7  days 

6 

8-14  days 

0 

15-30  days 

2 

31-60  days 

0 

61  days  or  longer 

0 

no  experience 

289 

missing 

58 

no  experience 

236 

less  than  1  hour 
1  hour  but  less  than  4 
hours 


72 

205 


76 


17.  Over  the  last  12  months,  once  you  got  to 
your  provider’s  office,  how  long  did  you 
usually  have  to  wait  to  see  your  provider 
when  you  had  an  appointment  for  care? 


18.  Over  the  last  12  months,  when  you  went 
for  medical  care,  how  often  did  you  see  the 
same  provider? 


32a.  On  how  many  of  these  visits  were  you 
advised  to  quit  smoking? 


4  hours  but  less  than  7  113 

hours 

7  hours  but  less  than  24  41 

hours 

24  hours  or  more  42 

missing  24 

no  experience  36 


less  than  10  minutes  72 

10-15  minutes  275 

16-30  minutes  226 

more  than  30  minutes  but  52 

less  than  45  minutes 

45  minutes  to  1  hour  39 

1  to  2  hours  1 1 

2  hours  or  more  2 

missing  20 

no  experience  33 


always  405 

most  of  the  time  224 

sometimes  43 

rarely  or  never  1 1 

missing  17 

no  experience  23 

1  visit  40 

2  to  4  visits  33 

5  to  9  visits  9 

10  or  more  visits  6 

missing  4 


